Updating a Systematic Search • ASySD

Setup

First, install and load the ASySD package.

# devtools::install_github("camaradesuk/ASySD")library(ASySD)
library(ASySD)

Loading citation data

Load citations from an existing search file using the load_search() function. In this example, we use a csv format.

existing_search <- load_search("old_sr_search.csv")

Load citations from a new systematic search.

new_search <- load_search("new_sr_search.csv")

Combine old and new citation data

Before deduplication, we must bind the citations into one dataframe. First, give each search a different source so that we can specify which citations to retain.

existing_search$source <- "old"
new_search$source <- "new"

all_citations <- plyr::rbind.fill(existing_search, new_search)

Automated deduplication

Remove duplicate citations automatically using the dedup_citations function. Here we have specified the argument merge=TRUE to indicate that we want to merge duplicate records and have a record of which citations have been merged into one. We have specified in the keep_source argument that we wish to preferentially retain old citations. In practice, this means that the duplicate_id chosen for a set of records will preferentially be the record_id of a citation in the OLD systematic search. This is to facilitate easy record linkage - see later.

results <- dedup_citations(all_citations, merge_citations = TRUE, keep_source = "old")
#> formatting data...
#> identifying potential duplicates...
#> identified duplicates!
#> flagging potential pairs for manual dedup...
#> Joining with `by = join_by(duplicate_id.x, duplicate_id.y)`
#> 8972 citations loaded...
#> 472 duplicate citations removed...
#> 8500 unique citations remaining!

The dedup_citations function returns a list of two dataframes by default. The first contains unique citations after duplicates were removed automatically by ASySD. In most cases, this will remove the vast majority of duplicates. There will likely be some duplicates remaining which need manual review by a human (see next step).

unique_citations <- results$unique

Manual deduplication

To check for additional duplicates, get the dataframe of citations for manual review. You can review within R or export as a csv / excel file to go through each row of pairs.

potential_duplicates <- results$manual_dedup

After reviewing the pairs, create a dataframe contianing only the true duplicate pairs. Here, all the suggested duplicates look like REAL duplicates. Alternatively, you could go through them one-by-one using the manual_dedup_shiny() function.

true_duplicates <- potential_duplicates

Now, to get the final deduplication results, use the dedup_citations_add_manual()function. To account for additional duplicates you have reviewed, add them into the additional_pairs argument.

final_results <- dedup_citations_add_manual(unique_citations, additional_pairs = true_duplicates, merge_citations = TRUE, keep_source = "old")
#> Joining with `by = join_by(record_id)`

Find new citations identified in update

Now we have a final set of unique citations, how can we find the new citations we added with our latest systematic search?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
new_citations <- final_results %>%
  filter(source == "new") 

new_citations %>%
   tail(3) %>%
   gt::gt() %>%
   gt::cols_hide(c(abstract))

duplicate_id	author	year	journal	doi	title	pages	volume	number	isbn	label	source	url	...1	uid	keywords	secondarytitle	issn	pmid	ptype	author_country	author_affiliation	record_ids
wos:000931052000002	Lipton, Stuart A.	2022	FREE RADICAL BIOLOGY AND MEDICINE	10.1016/j.freeradbiomed.2022.10.272	Hidden networks of aberrant protein transnitrosylation contribute to synapse loss in Alzheimer's disease	171-176	193	NA	NA	270223	new	NA	5561	wos:000931052000002	NA	NA	0891-5849	NA	NA	NA	NA	wos:000931052000002
wos:000931426100001	Marini, Sandro; Chung, Jaeyoon; Han, Xudong; Sun, Xinyu; Parodi, Livia; Farrer, Lindsay A.; Rosand, Jonathan; Romero, Jose Rafael; Anderson, Christopher D.	2023	INTERNATIONAL JOURNAL OF STROKE	10.1177/17474930231155816	Pleiotropy analysis between lobar intracerebral hemorrhage and CSF beta-amyloid highlights new and established associations	NA	NA	NA	NA	270223	new	NA	9861	wos:000931426100001	ICH \| beta-amyloid \| pleiotropy \| genetic epidemiology \| cadherin \| cerebral amyloid angiopathy	NA	1747-4930	NA	NA	NA	NA	wos:000931426100001
wos:000932018500001	Walker, Keenan A.; Duggan, Michael R.; Gong, Zhaoyuan; Dark, Heather E.; Laporte, John P.; Faulkner, Mary E.; An, Yang; Lewis, Alexandria; Moghekar, Abhay R.; Resnick, Susan M.; Bouhrara, Mustapha	2023	ANNALS OF CLINICAL AND TRANSLATIONAL NEUROLOGY	10.1002/acn3.51730	Kidney and lung crosstalk during critical illness: large-scale cohort study (FEB, 10.1007/s40620-023-01594-z, 2023)	NA	NA	NA	NA	270223	new	NA	6501	wos:000932018500001	NA	NA	2328-9503	NA	NA	NA	NA	wos:000932018500001

Lets also have a look at the citations identified in both searches by removing citations with a single source.

crossover <- final_results %>%
  filter(!source == "new") %>%
  filter(!source == "old") 

crossover %>%
   tail(3) %>%
   gt::gt() %>%
   gt::cols_hide(c(abstract))

duplicate_id	author	year	journal	doi	title	pages	volume	number	isbn	label	source	url	...1	uid	keywords	secondarytitle	issn	pmid	ptype	author_country	author_affiliation	record_ids
wos:000919545100001	Larkin, Howard D. D.	2023	JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION	10.1001/jama.2022.24490	Lecanemab Gains FDA Approval for Early Alzheimer Disease	363	329	NA	NA	270223, 270223	new, new	NA	3271	wos:000919545100001	NA	NA	0098-7484	36652625	Article	NA	NA	wos:000919545100001, scopus-2-s2.0-85147720543
wos:000924510300006	Chen, Shanquan; Price, Annabel C.; Cardinal, Rudolf N.; Moylett, Sinead; Kershenbaum, Anne D.; Fitzgerald, James; Mueller, Christoph; Stewart, Robert; O'Brien, John T.	2022	PLOS MEDICINE	10.1371/journal.pmed.1004124	Association between antidementia medication use and mortality in people diagnosed with dementia with Lewy bodies in the UK: A retrospective cohort study	NA	19	NA	NA	270223, 270223	new, new	NA	9981	wos:000924510300006	NA	NA	1549-1277	NA	NA	NA	NA	wos:000924510300006, wos:000925010400002
wos:000928044600001	Wang, Lin-Yu; Liu, Jiao; Peng, Yi-Zhu; Zhang, Cai-Ping; Zou, Wei; Liu, Feng; Zhan, Ke-Bin; Zhang, Ping	2022	NATURAL PRODUCT COMMUNICATIONS	10.1177/1934578X221141162	Curcumin-Nicotinate Attenuates Hippocampal Synaptogenesis Dysfunction in Hyperlipidemia Rats by the BDNF/TrkB/CREB Pathway: Involving Idol/LDLR Signaling to Eliminate A beta Deposition	NA	17	NA	NA	270223, 270223	new, new	NA	9791	wos:000928044600001	hyperlipidemia \| high-fat diet \| Curcumin-Nicotinate \| amyloid-beta \| BDNF \| TrkB \| CREB signaling \| synaptogenesis \| Idol/LDLR pathway	NA	1934-578X	NA	NA	NA	NA	wos:000928044600001, wos:000922862000001

To keep good records, we don’t want to lose track of identifiers for studies we have already included in a review. This is why specifying the citation to keep was important! To illustrate this, look specifically at the citations present in the old search.

old_citations <- final_results %>%
  filter(grepl("old", source)) # find all citations in old search

We can check that the duplicate ids here refer to the original record id in the existing_citations dataframe we imported. As you can see, they are all present. In the record_ids column you can see the different record_ids that have merged into a single citation. In case you make a mistake or don’t specify the record_id to keep as the duplicate_id, you can use these to trace back your citations to the original dataframes.

old_citations_check <- old_citations %>%
  filter(duplicate_id %in% existing_search$record_id) #check that all citations use the OLD record_id as the duplicate_id

crossover %>%
   tail(3) %>%
   gt::gt() %>%
   gt::cols_hide(c(abstract))

duplicate_id	author	year	journal	doi	title	pages	volume	number	isbn	label	source	url	...1	uid	keywords	secondarytitle	issn	pmid	ptype	author_country	author_affiliation	record_ids
wos:000919545100001	Larkin, Howard D. D.	2023	JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION	10.1001/jama.2022.24490	Lecanemab Gains FDA Approval for Early Alzheimer Disease	363	329	NA	NA	270223, 270223	new, new	NA	3271	wos:000919545100001	NA	NA	0098-7484	36652625	Article	NA	NA	wos:000919545100001, scopus-2-s2.0-85147720543
wos:000924510300006	Chen, Shanquan; Price, Annabel C.; Cardinal, Rudolf N.; Moylett, Sinead; Kershenbaum, Anne D.; Fitzgerald, James; Mueller, Christoph; Stewart, Robert; O'Brien, John T.	2022	PLOS MEDICINE	10.1371/journal.pmed.1004124	Association between antidementia medication use and mortality in people diagnosed with dementia with Lewy bodies in the UK: A retrospective cohort study	NA	19	NA	NA	270223, 270223	new, new	NA	9981	wos:000924510300006	NA	NA	1549-1277	NA	NA	NA	NA	wos:000924510300006, wos:000925010400002
wos:000928044600001	Wang, Lin-Yu; Liu, Jiao; Peng, Yi-Zhu; Zhang, Cai-Ping; Zou, Wei; Liu, Feng; Zhan, Ke-Bin; Zhang, Ping	2022	NATURAL PRODUCT COMMUNICATIONS	10.1177/1934578X221141162	Curcumin-Nicotinate Attenuates Hippocampal Synaptogenesis Dysfunction in Hyperlipidemia Rats by the BDNF/TrkB/CREB Pathway: Involving Idol/LDLR Signaling to Eliminate A beta Deposition	NA	17	NA	NA	270223, 270223	new, new	NA	9791	wos:000928044600001	hyperlipidemia \| high-fat diet \| Curcumin-Nicotinate \| amyloid-beta \| BDNF \| TrkB \| CREB signaling \| synaptogenesis \| Idol/LDLR pathway	NA	1934-578X	NA	NA	NA	NA	wos:000928044600001, wos:000922862000001

Exporting results

Once deduplication is complete, you can export the new unique records to a file for import into reference managers or systematic review software.

write_citations(new_citations, type="txt", filename="unique.txt")