In April 2024, CyCraft's cybersecurity research manager, Boik Su, and researcher, Stanley Cheng, presented a talk titled "Source Pollution Attack — A Hidden Threat in Cybersecurity" at the FIRST CTI conference in Germany. They discussed the definition of intelligence sources, their relationship with Cyber Threat Intelligence (CTI), how reliable intelligence sources assist organizations and communities in detecting and defending against threats, and how compromised intelligence sources can undermine the trust that organizations built over time.
"Intelligence sources" refer to information gathered from public websites, magazines, articles, and other publicly available data, as well as non-public intelligence sources circulated through specific channels. For instance, National Defense Authorization Act for Fiscal Year 2006 (U.S.) defines Open-Source Intelligence (OSINT) as "intelligence produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement." In the cybersecurity community, Justin Nordine, an OSINT researcher, compiled various public intelligence sources into the OSINT Framework, providing a comprehensive list of available intelligence.
CTI offers crucial information about attackers or attack incidents, including but not limited to malware that is used by attackers, attack techniques, victim distribution, C2 server IPs, phishing websites, and more. This data enhances the cybersecurity posture of communities, helping the development of reliable detection rules and mitigation measures, thereby reducing the likelihood of similar incidents recurring.
Developing CTI requires domain expertise and the ability to obtain frontline intelligence. Organizations often choose to use pre-compiled CTI information rather than producing it themselves. Providers that consolidate and offer CTI information are considered intelligence sources. The tools used to collect and analyze CTI also affect the final data presented and are included within the scope of intelligence sources.
CTI strengthens cybersecurity postures through various means. For example, IP/Domain blacklists are commonly used in firewalls, mail servers, and ISPs to prevent connections to untrusted or malicious sites. The source of the blacklist includes organizations that either manage their own blacklist or integrate sources from various agencies for public query. Organizations like Spamhaus provide various types of blacklists based on extensive analysis and research, while platforms like DNSBL.info allow users to query multiple blacklists simultaneously.
Network fingerprinting is another application, identifying malicious websites through unique characteristics. With technological advancements, fingerprinting now extends beyond TLS protocols to include HTTP, SSH, TCP, and more. Services like SSLBL collect fingerprints of malicious websites, and technical teams that define fingerprint calculation methods are also considered intelligence sources.
Signatures are widely used in analysis and detection, enabling antivirus software or sandboxes to quickly identify malware. Malware signatures vary, with direct indicators like file hashes or more complex conditions written into detection rules, such as YARA or SIGMA rules. Authors who publish these detection rules are the sources of such intelligence as well.
DNS records that are type of CTI information are often overlooked, can provide more information than expected. For example, many clients that are using online services place verification codes in TXT records to prove domain ownership, with codes often containing service-related strings like google-site-verification
, indicating the use of Google services.
TXT records also include SPF records, specifying which addresses can send emails on behalf of a domain, enhancing email security and reducing phishing attacks. From authorized sending addresses, one can identify third-party organizations trusted by the domain. CAA records enable domains to specify which certified authorities can issue certificates, similar to SPF records, revealing other organizations the domain trusts.
Depending on the acquisition method, DNS records are categorized as active or passive. Active records are obtained through DNS requests from DNS servers, while passive records come from organizations that collect large amounts of DNS data over time.
While intelligence sources provide real-time threat identification, information correlation, and system integration, uncritically accepting this information can lead to misdirection if the intelligence has been maliciously altered. Negative impacts include mistakenly identifying legitimate services as malware and blocking their connections, treating malicious files as benign and failing to isolate them, or allowing automated systems to download malicious scripts that execute arbitrary commands. Relevant cases are as follows:
The first example involves confusion arising from the usage or design issues of intelligence sources. JARM is a network fingerprinting technique based on hash values calculated from specific TLS server settings. This method is simple and quick. Yet, since it relies on partial settings, malicious TLS servers can adjust configurations to obtain the same JARM fingerprint as legitimate servers, disguising themselves. Conversely, legitimate TLS servers might be mistakenly identified as malicious due to similar JARM fingerprints, blurring the line between malicious and normal entities.
The second example pertains to service disruption issues. DShield offers a public platform where users can submit firewall logs and create blacklists for the public through statistical analysis of common attack sources. However, besides blacklists, the official website also provides other lists derived from data aggregation, such as Top 100 Source IPs or All Source IPs. DShield explicitly notes that these lists should not be used as blocklists, but some third-party organizations still see the reference as blocklists, which leads to legitimate IPs being wrongly identified as malicious and clients rejecting connections, thereby disrupting access to services.
The third example involves the threat of arbitrary downloads resulting from malicious manipulation of intelligence sources. In June 2023, Darcy Clarke, former principal engineer at npm, disclosed a metadata pollution issue in npm ecosystem management tools. He revealed that npm's official ecosystem management tools allowed developers to update package metadata through different interfaces, enabling attackers to pollute metadata and trigger arbitrary downloads, leading to remote code execution threats.
The reason why this incident happened was the fact that npm failed to verify the consistency between developer-provided metadata and the original code. Attackers could exploit updated interfaces to pollute metadata, further affecting third-party tools or services that refer to this source. The pollution steps are briefly outlined as follows:
Introduction
{
_id: <pkg>,
name: <pkg>,
'dist-tags': { ... },
versions: {
'<version>': {
_id: '<pkg>@<version>`,
name: '<pkg>',
version: '<version>',
dist: {
integrity: '<tarball-sha512-hash>',
shasum: '<tarball-sha1-hash>',
tarball: ''
},
scripts: {},
dependencies: {}
}
},
_attachments: {
0: {
content_type: 'application/octet-stream',
data: '<tarball-base64-string>',
length: '<tarball-length>'
}
}
}
As a result, discrepancies arise between the metadata stored at:
This inconsistency causes confusion for developers who rely on authoritative sources for package information. More critically, it introduces severe security threats such as:
Despite these issues, npm has yet to address this vulnerability, leaving third-party platforms at risk. Affected platforms include, but are not limited to:
To effectively identify, detect, and defend against source pollution attacks, we propose two key strategies:
1. Multi-factor Validations
In the Cyber Threat Intelligence (CTI) lifecycle, there is already a phase requiring users to periodically verify the authenticity of intelligence sources. Based on our analysis of this attack model, we propose the doable actions as below:
2. Source Awareness
A strong monitoring and management process for intelligence sources is the best defense against source pollution attacks. In practice, many Security Operations Centers (SOCs) rely on automated tools and systems, making it difficult to detect contaminated intelligence sources in real time. The later an organization identifies malicious intelligence, the more vulnerable its overall security posture becomes.
For example, at the end of 2023, a misconfiguration in Spamhaus's CSS blocklist led to a temporary false-positive incident, causing email disruptions across Japan and several other countries. Many IT professionals were unaware that the issue actually came from their Email Gateways referencing Spamhaus's blocklist, leading to delays in resolving the problem and disruptions to daily operations (see Figure 5).
Had IT teams maintained robust intelligence management procedures, they could have quickly responded to mitigation of such threats.
To fundamentally mitigate source pollution attacks, organizations must implement frequent multi-factor authentication—even for trustworthy intelligence sources. Additionally, maintaining a well-structured and transparent intelligence monitoring and management framework will improve incident response time when threats arise.
In summary, by enforcing periodic intelligence verification, maintaining full visibility over intelligence sources, and promptly replacing unstable sources, organizations can establish a mature and resilient Cyber Threat Intelligence (CTI) lifecycle.
Writer: Stanley Cheng
CyCraft is a cybersecurity company founded in 2017, focusing on autonomous AI technology. Headquartered in Taiwan, it has subsidiaries in Japan and Singapore. CyCraft provides professional cybersecurity services to government agencies, police and defense forces, banks, and high-tech manufacturers throughout the Asia-Pacific region. It has received strong backing from the CID Group and Pavilion Capital, a Temasek Holdings Private Limited subsidiary.