The New York Times Exposed GitHub Token Breach

TL;DR

Think keeping your GitHub tokens safe is a walk in the park? More like running a marathon in a minefield. This post dives into a high-profile breach, the perils of exposed tokens, and why securing your non-human identities in GitHub is harder than teaching cats to code. Grab your snorkel; we're starting in the shallow end but diving all the way to the deep end.

Background About the Breach

In January 2024, The New York Times faced a significant security breach resulting from an exposed GitHub token, leading to the theft and subsequent leak of 270GB of internal data, originally disclosed on the 4chan message board. This data included sensitive IT documentation, infrastructure tools, and source code, including the notable Wordle game.

Initially, The Times attributed the breach to compromised credentials for a third-party cloud-based code platform but later confirmed GitHub as the source. Despite the substantial data leak, The Times reported no impact on its internal systems or operations, thanks to its continuous monitoring and rapid response to the breach.

What to Expect From This Blog Post

This blog post will delve into several critical areas surrounding the breach:

What are GitHub Tokens? An in-depth look at the various types of GitHub tokens, their uses, and the advantages of each type.
What Are the Risks with Exposed GitHub Tokens? An exploration of the potential risks and security vulnerabilities that arise from exposed tokens.
How Might Exposure Happen? Analysis of common scenarios leading to token exposure, including misconfigurations and inadequate security practices.
Preventive Measures Best practices and strategies to prevent token exposure and secure GitHub repositories.
Detection of Exposure: Techniques and tools for identifying exposed tokens and unauthorized access.
Response Strategies: Effective response plans and actions to take following a token exposure or data breach.

GitHub in 2 Minutes

GitHub is a web-based platform for version control and collaboration, allowing developers to manage and store their code. It uses Git, a distributed version control system, to track changes in source code during software development. GitHub facilitates collaborative coding by enabling multiple developers to work on projects simultaneously and integrates various tools for continuous integration and deployment. It's widely used for open-source projects, private repositories, and as a central hub for DevOps practices.

What are GitHub Tokens?

GitHub tokens are designed to securely authenticate and interact with the GitHub API. These tokens come in several types, each tailored for specific use cases and offering varying levels of access and control:

Personal Access Tokens (PATs): Managed at the organization level (https://github.com/organizations/{org_slug}/settings/personal-access-tokens/active) These are user-generated tokens intended for personal automation tasks. They are tied to individual user accounts and can be customized with specific scopes to control access to different parts of the GitHub API. PATs are static strings, making them straightforward but requiring careful management to prevent misuse. There are two types of PATs:

Classic PATs: These use broad OAuth scopes, providing wide-ranging access.

Fine-Grained PATs: As the name suggests, these offer more precise control, aligning with specific permissions and access levels supported by the latest GitHub APIs. Fine-Grained PATs are generally recommended due to their enhanced security.
OAuth Tokens: Generated through OAuth applications, these tokens are managed at the organization level (https://github.com/organizations/{org_slug}/settings/oauth_application_policy), and require user consent and are tied to OAuth scopes, defining the permissions granted to the third-party applications. They are ephemeral, meaning they are obtained through a user login process and have a limited lifespan, which enhances security by reducing the risk of long-term exposure.

‍Important note: As an organization owner on GitHub, you do not have the ability to directly manage or restrict OAuth apps at the individual user level within your organization. Users manage their own OAuth apps and personal access tokens.
GitHub App Tokens: These tokens are managed at the organization level (https://github.com/organizations/{org_slug}/settings/installations), and are associated with GitHub Apps, which operate at the app level with predefined permissions. Unlike OAuth tokens, GitHub App tokens are designed for app identity rather than user identity, allowing for more granular control over repository and organizational access.
Installation Access Tokens: Specific to GitHub Apps, these tokens are generated for individual installations of an app. Each installation can have distinct permissions, enabling fine-tuned access management. Installation tokens are ephemeral and scoped to the specific resources the app needs to access.

Advantages and Disadvantages of Token Types

Personal Access Tokens (PATs)

Advantages

Ideal for individual developers automating personal tasks
Direct control over token's scope
Fine-Grained PATs offer enhanced security with detailed permission settings.

Disadvantages

Static tokens require careful management to prevent misuse
Classic PATs have broad OAuth scopes, which can lead to over-permissioning
Requires manual rotation and management

OAuth Tokens

Advantages

Best for third-party applications needing user authorization (once)
Dynamic, user-centric access model
Ephemeral tokens enhance security by reducing long-term exposure

Disadvantages

Dependency on user consent can complicate automation
Tokens are temporary, requiring periodic reauthorization
Potential for excessive permissions if scopes are not properly managed

GitHub App Tokens

Advantages

Secure, scalable authentication for apps
Predefined permissions reduce risk
Separation from user identity simplifies management

Disadvantages

Requires app-specific configuration
Less flexibility for user-specific tasks compared to PATs
Managing app permissions can be complex

Installation Access Tokens

Advantages

Granular control over app installations
Ensures each installation accesses only necessary resources
Ephemeral tokens minimize security vulnerabilities

Disadvantages

Limited to specific installations, requiring multiple tokens for different app instances
Token management can become complex with many installations
Requires GitHub App infrastructure for generation and use

Managing these tokens effectively is crucial. For instance, organizations can control and monitor PATs at the organizational level, ensuring that only necessary permissions are granted and that tokens are regularly reviewed and rotated to maintain security integrity. Centralized management of GitHub Apps and OAuth Apps within organization settings further enhances oversight and control, reducing the risk of unauthorized access.

In summary, understanding and leveraging the appropriate GitHub token type for specific use cases can significantly enhance security and operational efficiency in managing access to GitHub repositories and APIs.

When to Use and When to Avoid Different Token Types

Personal Access Tokens (PATs)

Use When

Individual developers need to automate personal tasks or scripts
Direct and specific control over access scopes is required
Fine-grained permissions are necessary for enhanced security

Avoid When

There is a need for scalable, app-level authentication
Managing a large number of tokens becomes cumbersome
Organizational policies mandate more dynamic or ephemeral access models

OAuth Tokens

Use When

Third-party applications require user consent to access resources
There is a need for a user-centric, dynamic access model
Ephemeral access is preferred to enhance security

Avoid When

User consent processes complicate automation and continuous integration workflows
Broad OAuth scopes lead to excessive permissions that exceed the principle of least privilege
Tokens need to be static and long-lived for specific use cases

GitHub App Tokens

Use When

Building and maintaining scalable applications that interact with GitHub
Predefined permissions can simplify access management and reduce security risks
Separation of app and user identities is required for better security practices

Avoid When

User-specific tasks or fine-grained control over individual user actions are needed
App-specific configuration and management are too complex for the given use case
The infrastructure to support GitHub Apps is lacking

Installation Access Tokens

Use When

Granular control over individual app installations is necessary
Ensuring minimal access to only necessary resources is a priority
Ephemeral tokens are preferred to minimize long-term security risks

Avoid When

Managing multiple installations and corresponding tokens becomes too complex.
The need for more generalized or broader access tokens arises.
The GitHub App infrastructure required for generating these tokens is not available

Practical Scenarios

Automating Personal Tasks: Use Fine-Grained PATs for scripts or workflows where specific, detailed access is necessary.
Integrating Third-Party Services: Opt for OAuth Tokens when a third-party service needs to interact with your GitHub repositories based on user consent.
Developing Scalable Applications: Choose GitHub App Tokens for applications that require a secure, app-level authentication mechanism with predefined permissions.
Managing Multiple Installations: Implement Installation Access Tokens for environments where granular control over each installation’s permissions is crucial.

What Are the Risks with Exposed GitHub Tokens?

Exposed GitHub tokens present significant risks to an organization due to their broad and powerful access to the GitHub API. Here are the primary risks associated with exposed tokens, some of which unfortunately materialized in the context of the breach reported by The New York Times.

Intellectual Property Exposure

Full Source Code and Commit History: Unauthorized access to repositories can reveal proprietary codebases, including commit history that links specific changes to employees. This can lead to intellectual property theft and competitive disadvantages. In the case of the breach reported by The New York Times, the result was the theft of 270GB of sensitive data, including IT documentation and source code.

Exposure of Sensitive Information

Secrets and Internal Infrastructure Details: GitHub Actions often store secrets and configuration details for internal services. Exposure can lead to further breaches of internal systems.
List of Users and Groups: Access to organization members and teams can expose sensitive employee information, potentially including PII.

Manipulation of Repository Content

Code Changes and Injection: Attackers can alter the source code, leading to supply chain attacks or facilitating lateral movement within the organization. Injected malicious code can propagate through the organization’s development pipeline.
Secret and Workflow Alterations: By modifying secrets and workflows, attackers can gain persistent access, escalate privileges, and further compromise the organization’s infrastructure.
Write, Delete, and Update Operations: The GitHub API allows for destructive actions such as deleting repositories or altering code, which can severely disrupt development and operational workflows.

How Might This Exposure Have Happened?

The exact method by which the New York Times' GitHub token was exposed remains unclear, but several potential scenarios could explain how such a breach might occur. Based on publicly known data and common security vulnerabilities, the following outlines detailed and technical possibilities that may have led to the exposure:

Human Error in Code Commits

Plaintext Token in Commits: An engineer might accidentally commit a plaintext GitHub token into the source code and push it to a repository. If this repository is public, the token becomes instantly accessible to anyone. Even if the repository is private, the token is still at risk if the repository’s access control is compromised.
Public Workflow Logs or Artifacts: A GitHub workflow executed on a public repository might inadvertently log a token. GitHub Actions logs and artifacts are accessible by anyone with access to the repository, and threat actors actively scan public repositories for such exposures. Even brief exposures can be exploited, necessitating rapid response and token rotation.

Malware or Stealer Execution

Compromised Engineer Workstations: If malware or stealer software is executed on an engineer’s machine, it could exfiltrate GitHub tokens from development environments or configuration files. The recent Snowflake-related breaches were reportedly the result of Stealer software.
Malicious IDE Extensions: This type of malware often targets Integrated Development Environments (IDEs) and other tools where tokens might be stored in plaintext. On June 9th, an article published by BleepingComputer discusses a recent research conducted by a group of security researchers, demonstrating the security challenges that come with VSCode extensions, the magnitude of the risks they pose to organizations, and the relative ease with which they might be introduced to a developer’s workstation.

Compromised GitHub Workflow Contexts

Malicious Pull Requests: Threat actors might contribute malicious code via a pull request to a public repository. If the organization’s CI/CD process automatically executes workflows for these contributions, and if the code passes review unnoticed, it could extract environment variables or secrets during the build process.
Artifact Exfiltration: Malicious code in workflows could write sensitive information, including tokens, to logs or artifacts. These are then accessible by the threat actor once the workflow completes, as seen in incidents like the PyTorch breach.

OAuth Token Misuse

Untrusted Software: An engineer might install an IDE plugin or another seemingly benign software that requests OAuth access to their GitHub account. If granted, this software could exfiltrate repository data the engineer has access to, based on the permissions granted during the OAuth consent process.
OAuth Scopes: If the OAuth application requests broad scopes, it could access and manipulate repositories, clone private repositories, or expose sensitive data.

Public Repositories with GitHub Actions

Large Number of Public Repos: The NY Times GitHub organization contains over a hundred public repositories, many with GitHub Actions configured. This increases the attack surface significantly.
Workflow Configurations: Public workflows might be exploited if they are not securely configured. Attackers could leverage these workflows to gain access to secrets and tokens inadvertently exposed during the build process.

Preventive Measures

To prevent exposure and enhance security when using GitHub OAuth apps and personal access tokens (PATs), it's crucial to implement the following advanced measures:

Minimizing Static Key Usage

Ephemeral Key Utilization

Prefer the use of ephemeral keys over static keys wherever possible. Ephemeral keys, such as those generated dynamically for short-lived operations, reduce the risk of long-term exposure and compromise.
Implement automated workflows to generate, use, and revoke ephemeral keys. For example, use GitHub Actions or other CI/CD tools to create short-lived tokens that expire after a specific task is completed.

Token Rotation

Regularly rotate static keys and ensure old keys are promptly revoked. Implement automated scripts or use GitHub’s API to manage token rotation schedules.
Employ tools like HashiCorp Vault or AWS Secrets Manager to handle token generation, storage, and rotation securely.

Granular Permission Management

Principle of Least Privilege (PoLP)

Apply the principle of least privilege to all tokens. Configure each token to have only the minimal required permissions for its specific use case. Avoid using tokens with broad access unless absolutely necessary.
Use GitHub’s fine-grained personal access tokens, which allow more precise control over the scopes and permissions of each token.

Scope Limitation

Define the narrowest possible scopes for each token. For instance, if a token is only needed for repository access, ensure it does not have organizational or administrative privileges.
Explicitly specify repository-level permissions instead of granting organization-wide access. Use the repo scope judiciously and prefer more specific scopes like repo:status and repo_deployment.

Access Control Policies

Implement and enforce access control policies at the organization level. Use GitHub’s OAuth App access restrictions to control which third-party applications can be authorized by users.
Regularly review and audit authorized OAuth apps and personal access tokens to ensure compliance with organizational policies.

Enhanced Security Practices

Token Revocation and Monitoring

Develop and maintain scripts or use security tools to continuously monitor the usage of tokens. Identify and revoke tokens that are no longer in use or exhibit suspicious activity.
Integrate GitHub’s audit logs with SIEM (Security Information and Event Management) systems to track and analyze token-related activities in real time.

Environment Segmentation:

Segregate environments (e.g., development, staging, production) and use separate tokens for each environment. This reduces the impact of a compromised token by limiting its scope to a specific environment.
Employ network segmentation and VPC (Virtual Private Cloud) configurations to restrict access to GitHub resources based on environment-specific security policies.

Automated Security Scanning

Use automated security scanning tools to detect exposed tokens within your codebase. Tools like GitHub’s Secret Scanning or third-party solutions such as TruffleHog can help identify and alert on token exposures.
Implement pre-commit hooks and CI/CD pipeline integrations to prevent the accidental inclusion of tokens in the codebase.

Detection of Exposure

Continuous Scanning and Monitoring

Public Tool Utilization

Consistently scan your repositories for exposed tokens or potential exposures using advanced tools like TruffleHog. Configure these tools to run as part of your CI/CD pipeline to ensure real-time detection.
Leverage GitHub’s built-in Secret Scanning program, which proactively searches for exposed secrets in public repositories and notifies the repository owners. Enable this feature and configure notifications to alert your security team immediately.

Centralized Log Management and Analysis

Log Streaming and Centralization

Stream GitHub audit logs to a centralized logging system. Ensure logs are ingested in near real-time to facilitate timely detection.
Implement log filters to capture specific events related to programmatic access. Focus on the programmatic_access_type field values to identify operations conducted via OAuth tokens and personal access tokens.

Event Correlation and Anomaly Detection

Correlate access events across different repositories and user activities. Utilize machine learning algorithms to analyze patterns and detect anomalies that deviate from the baseline behavior.
Identify unusual events such as the download_zip operation, which should not occur routinely via OAuth tokens in a standard environment. If this event is observed across multiple repositories, it should trigger an immediate investigation.

Unusual Activity Alerts

Configure your logging system to generate alerts for anomalous activities. For instance, detect and alert on bulk repository cloning, unusual data exfiltration patterns, or frequent access from atypical geographic locations.
Implement threshold-based alerting for specific operations, such as multiple repo_downloads or repo_fork events within a short time frame, which may indicate potential malicious activity.

Response Strategies

When a token exposure incident occurs, immediate and precise actions are crucial to mitigate the potential damage. Follow these expert tactical guidelines to respond effectively:

Immediate Token Revocation and Rotation

Token Revocation

Instantly revoke all compromised tokens. Utilize GitHub’s API to automate the revocation process for OAuth tokens and personal access tokens.
Prioritize revoking tokens with the broadest access first to minimize the risk of extensive unauthorized activities.

Token Rotation

Rotate all affected tokens to new ones. Ensure that the new tokens have the minimal required permissions as part of the principle of least privilege.
Update all dependent services and applications with the new tokens to restore normal operations. Prepare for potential downtime and communicate with stakeholders about the incident and recovery process.

Automated Processes

Implement scripts or automated workflows to expedite the revocation and rotation of tokens. Utilize tools like GitHub Actions or external automation platforms to streamline these processes.

Summary

Managing and securing your non-human identities (NHIs) across your entire ecosystem is absolutely critical, not just within GitHub but everywhere your digital operations span. NHIs, such as service accounts, API tokens, and automation scripts, have broad and powerful access that, if compromised, can lead to significant breaches and data loss. These identities often bypass traditional security measures designed for human users, making them prime targets for attackers. Ensuring their security involves stringent access controls, regular rotation, monitoring, and applying the principle of least privilege. Neglecting these practices can result in catastrophic vulnerabilities, exposing sensitive data, disrupting operations, and damaging your organization's reputation.

The New York Times Exposed GitHub Token Breach

TL;DR

Background About the Breach

What to Expect From This Blog Post

GitHub in 2 Minutes

What are GitHub Tokens?

Advantages and Disadvantages of Token Types

Personal Access Tokens (PATs)

OAuth Tokens

GitHub App Tokens

Installation Access Tokens

When to Use and When to Avoid Different Token Types

Personal Access Tokens (PATs)

OAuth Tokens

GitHub App Tokens

Installation Access Tokens

Practical Scenarios

What Are the Risks with Exposed GitHub Tokens?

Intellectual Property Exposure

Exposure of Sensitive Information

Manipulation of Repository Content

How Might This Exposure Have Happened?

Human Error in Code Commits

Malware or Stealer Execution

Compromised GitHub Workflow Contexts

OAuth Token Misuse

Preventive Measures

Minimizing Static Key Usage

Granular Permission Management

Enhanced Security Practices

Detection of Exposure

Continuous Scanning and Monitoring

Centralized Log Management and Analysis

Response Strategies

Immediate Token Revocation and Rotation

Summary

Related Blogs

Uh Oh(Auth): Lessons from the Recent CyberHaven Incident

Token Trouble: How a Malicious Update Brought Crypto Mining to Lottie Player Users

Tokens Gone Wild: Inside the Internet Archive's Two-Year Exposure