Characterizing the Vector Data Ecosystem

A growing body of information on vector-borne diseases has arisen as increasing research focus has been directed towards the need for anticipating risk, optimizing surveillance, and understanding the fundamental biology of vector-borne diseases to direct efforts to control and mitigation. The scope and scale of this information, in the form of data, comprising database efforts, data storage, and serving approaches, mean that it is distributed across many formats and data types. Data ranges from collections records to molecular characterization, geospatial data to interactions of vectors and traits, infection experiments to field trials. New initiatives arise, often spanning the effort traditionally siloed in specific research disciplines, and other efforts wane, perhaps in response to funding declines, different research directions, or lack of sustained interest. Thusly, the world of vector data - the Vector Data Ecosystem - can become unclear in scope, and the flows of data through these various efforts can become stymied by obsolescence, or simply by gaps in access and interoperability. As increasing attention is paid to creating FAIR (Findable Accessible Interoperable, and Reusable) data, simply characterizing what is ‘out there’, and how these existing data aggregation and collection efforts interact, or interoperate with each other, is a useful exercise. This website and related project presents a snapshot of current vector data efforts, a brief description of their stated scope and purpose, and level of accessibility. We welcome additions to this set, following a similar template, to improve this resource for the larger vector ecology and vector-borne disease research and practitioner community.

We invite you to view our blog post about this work and see the publication on the database which includes more context and commentary.

(FO) Fully Open Database
(PA) Partially Accessible Database

Global Biological Information Facility (GBIF) depositors and aggregators

GBIF

GBIF is a world-wide repository for species occurrence records from across the tree of life, with over 1 billion species occurrence records.

Note a large majority of the samples here come from other depositors/aggregators in this section (FO)

GBIF.us (Formally BISON)

GBIF.us is the official US Node of GBIF, focusing on government collections and invasive species in the United States, US associated territories, and Canada . (Replaces BISON) (FO)

SCAN

SCAN serves as a regional GBIF node specializing in providing arthropod occurrence data, aggregating records from over 225 data providers in North America. Providers include collections maintained by academic institutions, natural history museums, government agencies, and more (FO)

NEON

NEON monitors ecosystems across the US, providing time series and abundance data for species across the project’s field sites, which include 47 terrestrial sites (FO)

IDigBio

iDigBio is an initiative to digitize museum holdings undertaken by the National Resource for Advancing Digitization of Biodiversity Collections (ADBC) and funded by the NSF (FO)

VertNet

VertNet is a NSF-funded collaboration to streamline the availability of vertebrate biodiversity data, which may include arthropods associated with records (e.g., parasites) (FO)

BugGuide

BugGuide provides community science occurrence data focused on insects, spiders, and related arthropods (FO)

iNaturalist

iNaturalist solicits observations from the public that are identified by users on the platform (FO)

State /regional vector surveillance databases

VectorSurv

The Vectorborne Disease Surveillance System (VectorSurv) is the umbrella name for a family of state-specific branded web services for vector control and public health agencies in the US. VectorSurv has regional sites for California, Arizona, California, Hawaii, Nebraska, New Jersey, North Carolina, North Dakota, South Dakota, Tennessee, Utah, and Washington. Arboviral mosquito surveillance such as sentinel animal data, are freely viewable through VectorSurv Maps (PA)

Iowa Mosquito Surveillance

The State of Iowa provides a centralized database of mosquito surveillance that is available online through a partnership between the Iowa State University Medical Entomology Laboratory and the Iowa Department of Public Health. Mosquito surveillance data, including mosquito population abundance data for a wide variety of species, are available from 1969 (FA)

Trait databases

TPT -TCN

The Terrestrial Parasite Tracker - Thematic Collection Network (TPT-TCN) is a new project funded by the NSF’s ADBC program to facilitate the digitization of arthropod ectoparasite and vector specimens held in natural history museums collections

ArboNET

The National Arbovirus Surveillance System (ArboNET) system relies on passive surveillance, such as clinician diagnosis, testing, and reporting to local public health authorities. Reported data include human arboviral disease cases, and non-human infections from mosquito populations, veterinary cases, wildlife, and sentinel surveillance animals (PA)

VectorNet

The European Network for Medical and Veterinary Entomology (VectorNet) is a joint initiative of the European Food Safety Authority (EFSA) and the European Centre for Disease Prevention and Control (ECDC). The Project supports the collection of data on vectors and pathogens in vectors, related to both animal and human health (PA)

VectorMap-GR

VectorMap-GR is a geographically-restricted database of mosquito populations, limited to Crete (PA)

VectorByte VecTraits Database

VecTraits hosts curated ecological trait data for vectors and some pathogens, such as temperature-dependent growth and survival rates, fecundity, vector competence, and more (FO)

WingBank

WingBank is a database of over 10,000 images of mosquito wings that could have applications for AI-driven mosquito species identification (FO)

Abuzz

Abuzz maintains a database with recordings of mosquito wing beat frequencies, which are used for identification (FO)

Omics databases

VectorBase

VectorBase has extensive genomics data and tools, with hundreds of datasets deposited spanning genome assemblies, proteomics, population genetics, and expression data. Part of VEuPathDB (FO)

MosquitoDB

MosquitoDB is an African-led project to collate mosquito data, primarily from national malaria control programmes maintained by the Pan-African Mosquito Control Association.

ClinEpiDB

ClinEpiDB curates data from large (human) epidemiological field trials - some of which have paired vector data (PA)

MosquitoAlert

MosquitoAlert is a non-profit citizen science project, whereby the public submits pictures of mosquitoes and larval sites using a mobile phone app. Data are openly accessible online and available for download through the Mosquito Alert Data Portal (FO)

Global Vector Hub (GVH)

Global Vector Hub maintains a database of vector disease researchers; resources (e.g. policy/guideline documents); and geo referenced mosquito data including species presence, blood meal host, insecticide resistance status, and pathogen status.

VectorAtlas

The VectorAtlas project provides access to malaria vector occurrence records and the download of species occurrence model outputs.

VectorMap

The VectorMap Data Portal holds well curated, high-confidence, geospatial species occurrence data for a wide variety of medically important arthropod taxa, including mosquitoes (MosquitoMap), ticks (TickMap), fleas (FleaMap), mites (MiteMap), biting midges (MidgeMap), and sandflies (SandFlyMap). VectorMap is a product of the Walter Reed Biosystematics Unit (WRBU), a partnership between the Walter Reed Army Institute of Research (WRAIR) and the Smithsonian Institution National Museum of Natural History (NMNH).

BOLD

The Barcode of Life Data System (BOLD) is a storage and analysis platform for DNA barcode records. Developed at the Centre for Biodiversity Genomics in Canada, the platform offers tools for management, analysis, and identification, in addition to assembly and organization of sequence data. Though not limited to arthropod vectors, BOLD provides an extensive resource for georeferenced molecular data (FO)

Ag1000g

The Anopheles 1000 Genomes project produces whole genome sequence datasets. Started in 2014, Ag1000G aims to use whole-genome deep sequencing on large numbers of wild-caught An. gambiae to improve understanding of natural genetic variation as it relates to ecology and malaria epidemiology (FO)

Other databases

Tick Report

Tick Report is a commercial testing service to detect pathogens in user-submitted tick samples. Tick Report makes data and summary statistics from their testing program available online (PA)

MAP

The Malaria Atlas Project (MAP) hosts interactive mapping, trend visualization tools, and data directories for malaria and associated mosquito vectors. Data on vector occurrence, malaria prevalence, and covariates are generally available as spatial layers, downloadable through the platform’s Data Explorer mapping interface. Model outputs of risk and predicted geographic vector ranges are also available through this platform as layers (PA)

IRmapper

IR Mapper is an online, interactive mapping tool that displays insecticide resistance testing data for Anopheles species, and two arboviral vectors, Aedes aegypti and Ae. albopictus. Data are viewable through interactive mapping functions, and are mostly obtained on a monthly basis from peer-reviewed published literature, although other sources of insecticide resistance data are also used, such as published reports (PA)

Malaria Threat Maps

Malaria Threat Map is an interactive data and mapping platform produced by the WHO. This database specializes in biological challenges to malaria control and elimination, such as vector insecticide resistance and parasite drug resistance. For Anopheline vectors, insecticide resistance phenotype data based on WHO assays and maps of invasive malaria vector occurrence are viewable (PA)

EDWIP

The Ecological Database of the World’s Insect Pathogens (EDWIP) database contains associations of pathogens with insects and other arthropods.