Skip to contents

Setup

First, install and load the ASySD package.

# devtools::install_github("camaradesuk/ASySD")
library(ASySD)

Loading Citation Data

Begin by loading your citation data from an Endnote XML file using the load_search() function. You can specify alternative file formats such as CSV if needed.

citations <- load_search("systematic_search.xml", method="endnote")

Automated deduplication

Remove duplicate citations automatically using the dedup_citations function.

results <- dedup_citations(citations, merge_citations = TRUE)
#> formatting data...
#> identifying potential duplicates...
#> identified duplicates!
#> flagging potential pairs for manual dedup...
#> Joining with `by = join_by(duplicate_id.x, duplicate_id.y)`
#> 8948 citations loaded...
#> 3508 duplicate citations removed...
#> 5440 unique citations remaining!

The dedup_citations function returns a list of two dataframes by default. The first contains unique citations after duplicates were removed automatically by ASySD. In most cases, this will remove the vast majority of duplicates. There will likely be some duplicates remaining which need manual review by a human (see next step).

unique_citations <- results$unique

Manual deduplication

To address remaining duplicates, review potential pairs manually. You can examine them within R or export them to a CSV or Excel file for detailed inspection.

potential_duplicates <- results$manual_dedup

After reviewing the pairs, create a dataframe containing only the true duplicate pairs. Here, for simplicity, I have retained a subset of the potential_duplicates dataframe. If you exported to a file, you could remove the rows which are not duplicates and re-upload the rows with true duplicates.

true_duplicates <- potential_duplicates[1:16,]

Final Deduplication

Combine the results of automated and manual deduplication using the dedup_citations_add_manual() function. Include any additional duplicates you’ve identified in the additional_pairs argument.

final_results <- dedup_citations_add_manual(unique_citations, additional_pairs = true_duplicates)
#> Joining with `by = join_by(record_id)`

Exporting Results

You can now write your results to a file for import into a reference manager or systematic review software

write_citations(final_results, type="txt", filename="citations.txt")