2024-11-15

Source Pollution Attack - A Hidden Threat in Cybersecurity

In April 2024, CyCraft's cybersecurity research manager, Boik Su, and researcher, Stanley Cheng, presented a talk titled "Source Pollution Attack — A Hidden Threat in Cybersecurity" at the FIRST CTI conference in Germany. They discussed the definition of intelligence sources, their relationship with Cyber Threat Intelligence (CTI), how reliable intelligence sources assist organizations and communities in detecting and defending against threats, and how compromised intelligence sources can undermine the trust that organizations built over time.

Definition of Intelligence Sources

"Intelligence sources" refer to information gathered from public websites, magazines, articles, and other publicly available data, as well as non-public intelligence sources circulated through specific channels. For instance, National Defense Authorization Act for Fiscal Year 2006 (U.S.) defines Open-Source Intelligence (OSINT) as "intelligence produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement." In the cybersecurity community, Justin Nordine, an OSINT researcher, compiled various public intelligence sources into the OSINT Framework, providing a comprehensive list of available intelligence.

Relationship Between Intelligence Sources and Cyber Threat Intelligence (CTI)

CTI offers crucial information about attackers or attack incidents, including but not limited to malware that is used by attackers, attack techniques, victim distribution, C2 server IPs, phishing websites, and more. This data enhances the cybersecurity posture of communities, helping the development of reliable detection rules and mitigation measures, thereby reducing the likelihood of similar incidents recurring.

Developing CTI requires domain expertise and the ability to obtain frontline intelligence. Organizations often choose to use pre-compiled CTI information rather than producing it themselves. Providers that consolidate and offer CTI information are considered intelligence sources. The tools used to collect and analyze CTI also affect the final data presented and are included within the scope of intelligence sources.

Positive Impacts of Intelligence Sources

CTI strengthens cybersecurity postures through various means. For example, IP/Domain blacklists are commonly used in firewalls, mail servers, and ISPs to prevent connections to untrusted or malicious sites. The source of the blacklist includes organizations that either manage their own blacklist or integrate sources from various agencies for public query. Organizations like Spamhaus provide various types of blacklists based on extensive analysis and research, while platforms like DNSBL.info allow users to query multiple blacklists simultaneously.

Spamhaus 黑名單 — Figure 1: Spamhaus Blacklist

Network fingerprinting is another application, identifying malicious websites through unique characteristics. With technological advancements, fingerprinting now extends beyond TLS protocols to include HTTP, SSH, TCP, and more. Services like SSLBL collect fingerprints of malicious websites, and technical teams that define fingerprint calculation methods are also considered intelligence sources.

Signatures are widely used in analysis and detection, enabling antivirus software or sandboxes to quickly identify malware. Malware signatures vary, with direct indicators like file hashes or more complex conditions written into detection rules, such as YARA or SIGMA rules. Authors who publish these detection rules are the sources of such intelligence as well.

DNS records that are type of CTI information are often overlooked, can provide more information than expected. For example, many clients that are using online services place verification codes in TXT records to prove domain ownership, with codes often containing service-related strings like google-site-verification, indicating the use of Google services.

TXT records also include SPF records, specifying which addresses can send emails on behalf of a domain, enhancing email security and reducing phishing attacks. From authorized sending addresses, one can identify third-party organizations trusted by the domain. CAA records enable domains to specify which certified authorities can issue certificates, similar to SPF records, revealing other organizations the domain trusts.

Depending on the acquisition method, DNS records are categorized as active or passive. Active records are obtained through DNS requests from DNS servers, while passive records come from organizations that collect large amounts of DNS data over time.

Negative Impacts of Intelligence Sources

While intelligence sources provide real-time threat identification, information correlation, and system integration, uncritically accepting this information can lead to misdirection if the intelligence has been maliciously altered. Negative impacts include mistakenly identifying legitimate services as malware and blocking their connections, treating malicious files as benign and failing to isolate them, or allowing automated systems to download malicious scripts that execute arbitrary commands. Relevant cases are as follows:

CTI 情資來源汙染威脅模型 — Figure 2: Intelligence Source Contamination Threat Model

JARM Confusion

The first example involves confusion arising from the usage or design issues of intelligence sources. JARM is a network fingerprinting technique based on hash values calculated from specific TLS server settings. This method is simple and quick. Yet, since it relies on partial settings, malicious TLS servers can adjust configurations to obtain the same JARM fingerprint as legitimate servers, disguising themselves. Conversely, legitimate TLS servers might be mistakenly identified as malicious due to similar JARM fingerprints, blurring the line between malicious and normal entities.

Misconceptions About IP and Domain Blocklists

The second example pertains to service disruption issues. DShield offers a public platform where users can submit firewall logs and create blacklists for the public through statistical analysis of common attack sources. However, besides blacklists, the official website also provides other lists derived from data aggregation, such as Top 100 Source IPs or All Source IPs. DShield explicitly notes that these lists should not be used as blocklists, but some third-party organizations still see the reference as blocklists, which leads to legitimate IPs being wrongly identified as malicious and clients rejecting connections, thereby disrupting access to services.

DShield 清單 — Figure 3: DShield List, marked "DO NOT USE AS BLOCKLIST"

Metadata Pollution in npm Ecosystem Management Tools

The third example involves the threat of arbitrary downloads resulting from malicious manipulation of intelligence sources. In June 2023, Darcy Clarke, former principal engineer at npm, disclosed a metadata pollution issue in npm ecosystem management tools. He revealed that npm's official ecosystem management tools allowed developers to update package metadata through different interfaces, enabling attackers to pollute metadata and trigger arbitrary downloads, leading to remote code execution threats.

The reason why this incident happened was the fact that npm failed to verify the consistency between developer-provided metadata and the original code. Attackers could exploit updated interfaces to pollute metadata, further affecting third-party tools or services that refer to this source. The pollution steps are briefly outlined as follows:

Source Pollution Attacks in npm and Threat Mitigation Strategies

Introduction

Publishing a package to the NPM package through a standard procedure.
However, a vulnerability exists in the registry where developers can issue a PUT request to the endpoint https://registry.npmjs.com/-/<package-name>, as illustrated in Figure 4. Within this request, fields such as name, version, scripts, and dependencies under the version object can be freely modified.

{
  _id: <pkg>,
  name: <pkg>,
  'dist-tags': { ... },
  versions: {
    '<version>': {
      _id: '<pkg>@<version>`,
      name: '<pkg>',
      version: '<version>',
      dist: {
        integrity: '<tarball-sha512-hash>',
        shasum: '<tarball-sha1-hash>',
        tarball: ''
      },
      scripts: {},
      dependencies: {}
    }
  },
  _attachments: {
    0: {
      content_type: 'application/octet-stream',
      data: '<tarball-base64-string>',
      length: '<tarball-length>'
    }
  }
}

As a result, discrepancies arise between the metadata stored at:

https://registry.npmjs.com/<pkg>/
https://www.npmjs.com/package/<pkg>/v/<version>?activeTab=explore

This inconsistency causes confusion for developers who rely on authoritative sources for package information. More critically, it introduces severe security threats such as:

Malicious file downloads
Arbitrary script execution

Despite these issues, npm has yet to address this vulnerability, leaving third-party platforms at risk. Affected platforms include, but are not limited to:

Mitigation Strategies

To effectively identify, detect, and defend against source pollution attacks, we propose two key strategies:

Multi-factor Validations
Source Awareness

1. Multi-factor Validations

In the Cyber Threat Intelligence (CTI) lifecycle, there is already a phase requiring users to periodically verify the authenticity of intelligence sources. Based on our analysis of this attack model, we propose the doable actions as below:

Establish a practical verification process that includes at least two or more trusted validation sources.
Even for trustworthy intelligence sources, verification should still be enforced if any of the following occurs:
- The source was recently compromised.
- The organization that maintained the source has recently changed.
- The source is located in a region where it is going through political, geopolitical, or religious unrest.
Automated intelligence collection is not foolproof and comes with risks, including:
- Latency and accuracy issues when retrieving intelligence via APIs.
- Hallucination risks in intelligence obtained through Large Language Models (LLMs).

2. Source Awareness

A strong monitoring and management process for intelligence sources is the best defense against source pollution attacks. In practice, many Security Operations Centers (SOCs) rely on automated tools and systems, making it difficult to detect contaminated intelligence sources in real time. The later an organization identifies malicious intelligence, the more vulnerable its overall security posture becomes.

For example, at the end of 2023, a misconfiguration in Spamhaus's CSS blocklist led to a temporary false-positive incident, causing email disruptions across Japan and several other countries. Many IT professionals were unaware that the issue actually came from their Email Gateways referencing Spamhaus's blocklist, leading to delays in resolving the problem and disruptions to daily operations (see Figure 5).

Had IT teams maintained robust intelligence management procedures, they could have quickly responded to mitigation of such threats.

Spamhaus 事件發生後，一位 IT 人員的感慨 — Figure 5: Reflections of an IT Personnel After the Spamhaus Incident

Conclusion

To fundamentally mitigate source pollution attacks, organizations must implement frequent multi-factor authentication—even for trustworthy intelligence sources. Additionally, maintaining a well-structured and transparent intelligence monitoring and management framework will improve incident response time when threats arise.

In summary, by enforcing periodic intelligence verification, maintaining full visibility over intelligence sources, and promptly replacing unstable sources, organizations can establish a mature and resilient Cyber Threat Intelligence (CTI) lifecycle.

Writer: Stanley Cheng

About CyCraft

CyCraft is a cybersecurity company founded in 2017, focusing on autonomous AI technology. Headquartered in Taiwan, it has subsidiaries in Japan and Singapore. CyCraft provides professional cybersecurity services to government agencies, police and defense forces, banks, and high-tech manufacturers throughout the Asia-Pacific region. It has received strong backing from the CID Group and Pavilion Capital, a Temasek Holdings Private Limited subsidiary.

Subscribe to CyCraft's Newsletter

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By clicking this button, you agree to CyCraft's privacy policy and consent to CyCraft using the information you provided to contact you. You may cancel your subscription at any time.

We use cookies to ensure you get the best experience on our website. To learn more about cookies please view our privacy policy.

Deny

Source Pollution Attack - A Hidden Threat in Cybersecurity

Definition of Intelligence Sources

Relationship Between Intelligence Sources and Cyber Threat Intelligence (CTI)

Positive Impacts of Intelligence Sources

Negative Impacts of Intelligence Sources

JARM Confusion

Misconceptions About IP and Domain Blocklists

Metadata Pollution in npm Ecosystem Management Tools

Source Pollution Attacks in npm and Threat Mitigation Strategies

Mitigation Strategies

Conclusion

About CyCraft

Subscribe to CyCraft's Newsletter

Platform

Solution

Latest News

Resource Center

About CyCraft