# Data sources
There are many different types of data sources that can be used to enhance information controls research. Five particularly relevant data sources are discussed below: OONI, RIPE, Wehe, Rapid7 Labs, and Censored Planet. A list of additional sources that can be used to help detect censorship events is also provided.
OONI, the Open Observatory of Network Interference, is a global observation network and free software used to detect censorship, surveillance, and traffic manipulation on the Internet. OONI uses Free and Open Source Software (FL/OSS) to share observations and data about network interference. Since 2012, OONI has collected millions of network measurements from more than 200 countries around the world. It serves as a powerful resource for researchers, journalists, lawyers, activists, and advocates interested in exploring network anomalies. Interested researchers can obtain and analyze OONI data through OONI explorer and OONI API.
# OONI explorer
OONI Explorer is an easy way to access and review data that has been gathered by other OONI users. It provides a graphical data repository per country, allowing anyone to explore and interact with the network measurements that have been collected through OONI probes. With it, users can:
- Quickly perform fast queries of OONI data.
- View which websites were most recently blocked in each country.
- Review measurement coverage by test class and tested URL.
- Search within all OONI data with different criteria (blocked URLs, anomalies, ASN).
- Limitation: Can be time consuming while performing queries and analyzing reports in a wide data range of reports.
# OONI measurements API
OONI API offers a programmatic way to access, download, and search OONI data. With it, users can:
- Access complete data (raw network measurements in JSON file format).
- Perform fast data analysis.
- Limitation: Data needs to be downloaded (sufficient storage, network bandwidth required).
Additional examples of how to use OONI data can be found in the Data analysis section of the magma guide: How to use OONI data.
RIPE is the regional Internet registry for Europe, the Middle East, and parts of Central Asia. As such, it allocates and registers blocks of Internet number resources to Internet service providers (ISPs) and other organizations. The not-for-profit organization works to support the RIPE (Réseaux IP Européens) community and the wider Internet community. The RIPE NCC membership consists primarily of Internet service providers, telecommunication organizations, and large corporations.
RIPE provides a variety of data sources including public measurements and BGP announcement data. Relevant tools and data include:
- BGPlay (an advanced RIPEstat widget to visualize BGP routing information)
- Routing Information Service (RIS) raw data
- Global certificate and ROA statistics
Wehe is a research project based out of Northeastern University, the University of Massachusetts – Amherst, and Stony Brook University that collects data on ISP traffic differentiation (typically bandwidth throttling). The project performs network measurements for popular applications such as YouTube, Netflix, Amazon Prime Video, Spotify, Skype, and NBC Sports.
# Rapid7 Labs
Rapid7 Labs is the research arm of Rapid7. Its website offers “researchers and community members open access to data from Project Sonar, which conducts internet-wide surveys to gain insights into global exposure to common vulnerabilities.” Key datasets are detailed below:
# 'FDNS' dataset
Forward DNS (FDNS) dataset contains the responses to DNS requests for all forward DNS names known by Rapid7's Project Sonar. Until early November 2017, all of these were for the 'ANY' record with a fallback A and AAAA request if necessary. After that time, the ANY study represents only the responses to ANY requests, and dedicated studies were created for the A, AAAA, CNAME, and TXT record lookups with appropriately named files. The file is a GZIP compressed file containing the name, type, value, and timestamp of any returned records for a given name in JSON format.
# 'RDNS' dataset
Reverse DNS (RDNS) dataset includes the responses to the IPv4 PTR lookups for all non-blacklisted/private IPv4 addresses.
# 'HTTP' dataset
HTTP GET Responses dataset contains the responses to HTTP/1.1 GET requests performed against a variety of IPv4 public HTTP endpoints.
# 'HTTPS' dataset
HTTPS GET Responses dataset contains the responses to HTTP/1.1 GET requests against various HTTPS ports.
# 'SSL' datasets
# Common port (443 port) SSL dataset
SSL Certificates dataset contains X.509 certificate metadata observed when communicating with HTTPS endpoints.
# Non 443 port SSL dataset
SSL Certificates (non-443) dataset includes the X.509 certificate metadata observed when communicating with miscellaneous non-HTTPS endpoints, such as IMAPS, POP3S, or other services.
# 'UDP Scans' dataset
UDP Scans dataset contains regular snapshots of the responses to zmap probes against common UDP services.
# 'TCP Scans' dataset
TCP Scans dataset contains regular snapshots of the responses to zmap probes against common TCP services.
# Censored Planet
Censored Planet is a project from the University of Michigan that collects privacy and security violations in the Internet. Key datasets are detailed below:
# 'Satellite' DNS dataset
Satellite contains a regular snapshot of DNS resolutions of top websites as returned by a large number of Open DNS resolvers located in a wide range of networks.
# 'Quack' HTTP Dataset
Quack contains regular collection of the responses observed when connecting to infrastructural web servers (e.g. those operated by ISPs and governments), and asking the web server to serve content from a range of sensitive domains.
# Other sources
The following is a list of available data sources that can be used to help detect a censorship event that is currently on-going, or has taken place.
- Center for Applied Internet Data Analysis (CAIDA): Internet Outage Detection and Analysis (IODA)
- APNIC DNS Resolver Dashboard measures DNS recursive resolvers used in various countries and networks. Blog post
- Dyn Research: Outages Bulletin
- Internet-Wide Scan Data Repository: Longterm DNS survey
- NLnet Labs RPKI Analytics
- Cloudflare Cirrus publicly auditing the TLS/SSL certificates issued by certificate authorities
- Google Product Traffic data (via Google Transparency Reports)
- Google Trends find trending searchers worldwide or per country.
- Internet Intelligence Map
- NDT measurement data (via M-Lab)
- NIST RPKI deployment monitor
- Route Views Project BGP announcement data archive
- Steam stats
- Tor Metrics data (which is specific to the use of tor software)