Engineering - page 5

SHA-1 collision detection on

  • Mar 20, 2017
  • peff peff
  • Engineering

A few weeks ago, researchers announced SHAttered, the first collision of the SHA-1 hash function. Starting today, all SHA-1 computations on will detect and reject any Git content that shows evidence of being part of a collision attack. This ensures that GitHub cannot be used as a platform for performing collision attacks against our users.

This fix will also be included in the next patch releases for the supported versions of GitHub Enterprise.

Why does SHA-1 matter to Git?

Git stores all data in “objects.” Each object is named after the SHA-1 hash of its contents, and objects refer to each other by their SHA-1 hashes. If two distinct objects have the same hash, this is known as a collision. Git can only store one half of the colliding pair, and when following a link from one object to the colliding hash name, it can’t know which object the name was meant to point to.

Two objects colliding accidentally is exceedingly unlikely. If you had five million programmers each generating one commit per second, your chances of generating a single accidental collision before the Sun turns into a red giant and engulfs the Earth is about 50%.

Why do collisions matter for Git’s security?

If a Git fetch or push tries to send a colliding object to a repository that already contains the other half of the collision, the receiver can compare the bytes of each object, notice the problem, and reject the new object. Git has implemented this detection since its inception.

However, SHA-1 names can be assigned trust through various mechanisms. For instance, Git allows you to cryptographically sign a commit or tag. Doing so signs only the commit or tag object itself, which in turn points to other objects containing the actual file data by using their SHA-1 names. A collision in those objects could produce a signature which appears valid, but which points to different data than the signer intended. In such an attack the signer only sees one half of the collision, and the victim sees the other half.

What would a collision attack against Git look like?

The recent attack cannot generate a collision against an existing object. It can only generate a colliding pair from scratch, where the two halves of the pair are similar but contain a small section of carefully-selected random data that differs.

An attack therefore would look something like this:

  1. Generate a colliding pair, where one half looks innocent and the other does something malicious. This is best done with binary files where humans are unlikely to notice the difference between the two halves (the recent attack used PDFs for this purpose).

  2. Convince a project to accept your innocent half, and wait for them to sign a tag or commit that contains it.

  3. Distribute a copy of the repository with the malicious half (either by breaking into a hosting server and replacing the innocent object on disk, or hosting it elsewhere and asking people to verify its integrity based on the signatures). Anybody verifying the signature will think the contents match what the project owners signed.

How is GitHub protecting against collision attacks?

Generating a collision via brute-force is computationally too expensive, and will remain so for the foreseeable future. The recent attack uses special techniques to exploit weaknesses in the SHA-1 algorithm that find a collision in much less time. These techniques leave a pattern in the bytes which can be detected when computing the SHA-1 of either half of a colliding pair. now performs this detection for each SHA-1 it computes, and aborts the operation if there is evidence that the object is half of a colliding pair. That prevents attackers from using GitHub to convince a project to accept the “innocent” half of their collision, as well as preventing them from hosting the malicious half.

The actual detection code is open-source and was written by Marc Stevens (whose work is the basis of the SHAttered attack) and Dan Shumow. We are grateful for their work on that project.

Are there Git collisions?

Not yet. Git’s object names take into account not only the raw bytes of the files, but also some Git-specific header information. The PDFs provided by the SHAttered researchers collide in their raw bytes, but not when added to a Git repository. The same technique could be used to generate a Git object collision, but like the generation of the original SHAttered PDFs, it would require spending hundreds of thousands of dollars in computation.

What future work is there?

Blocking collisions that pass through GitHub is only the first step. We’ve already been working with the Git project to include the collision detection library upstream. Future versions of Git will be able to detect and reject colliding halves no matter how they reach the developer: fetching from other hosting sites, applying patches, or generating objects from local data.

The Git project is also developing a plan to transition away from SHA-1 to another, more secure hash algorithm, while minimizing the disruption to existing repository data. As that work matures, we plan to support it on GitHub.

Bug Bounty third anniversary wrap-up

Wrapping up GitHub Bug Bounty Third Year Anniversary Promotion

In honor of our Bug Bounty Program’s third birthday, we kicked off a promotional bounty period in January and February. In addition to bonus payouts, the scope of the bug bounty was expanded to include GitHub Enterprise. It may come as no surprise that including a new scope meant that the most severe bugs were all related to the newly included target.

There was no shortage of high-quality reports. Picking winners is always tough, but below are the intrepid researchers receiving extra bounties.

First prize

The first prize bonus of $12,000 goes to @jkakavas for their GitHub Enterprise SAML authentication bypass report which allowed an attacker to construct a SAML response that could arbitrarily set the authenticated user account. You can read more about the story on their blog.

Second prize

The second prize bonus of $8,000 goes to @iblue for their remote code execution bug found in the GitHub Enterprise management console. This was due to a static secret mistakenly being used to cryptographically sign the session cookie. The static secret was intended to only be used for testing and development. However, an unrelated change of file permissions prevented the intended (and randomly generated) session secret from being used. By knowing this secret, an attacker could forge a cookie that is deserialized by Marshal.load, leading to remote code execution.

Third prize

The third prize bonus of $5,000 goes to @soby for their report of another GitHub Enterprise SAML authentication bypass. This attack showed it was possible to replay a SAML response to have our SAML implementation use unsigned data in determining which user account was authenticated.

Best report

The best report bonus of $5,000 goes to @orangetw for their report of Server-Side Request Forgery. This report involved chaining together four vulnerabilities to deliver requests to internal services that end up executing attacker-controlled code. The reporter supplied a clear explanation of the problem through each step and included proof-of-concept scripts for the entire journey. While this report did not earn a prize for being the most severe, it is exactly the type of report we want to encourage reporters to submit.

Other notable reports

We received a wonderful Christmas gift from @orangetw with a SQL Injection bug on GitHub Enterprise. You can read more about how they learned Rails in three days before finding the bug.

A theme we’ve seen continue over the years is paying out for bugs that aren’t in our own code but in browsers. 2016 was no exception with @filedescriptor’s report of funky Internet Explorer behavior detailing how a triple-encoded host value in a URL is handled in redirects.

Lastly, Unicode gonna Unicode. @jagracy found a way to exploit the way Unicode was normalized in our code and in our database engine to deliver password reset emails to entirely different addresses than what were intended.

Some statistics

Our “Two Years of Bounties” post has detailed stats for our submissions during the first two years of the program. These years saw a total payout of $95,300 across 102 submissions out of a total of 7,050 submissions (1.4% validity rate). So far in the third year of the program, we have paid out for 73 submissions for a total of $81,700. Many of these reports fall into our “$200 thank you” bucket. These are issues that we do not consider severe enough for an immediate fix, but that we still want to reward our researchers for. Forty-eight of these issues were deemed high enough risk to warrant a write-up on We saw a total of 48 out of 795 valid reports, bringing our validity rate to 6%.

In 2016, we saw a slight decrease in the number of reports compared to the average of the previous two years. While we can’t know for certain, we suspect this is due to the ever-decreasing presence of “low hanging fruit.” Unfortunately, we saw an increase in the number of “critical” reports. All other categories saw a decrease in the number of reports.

Out of the accepted reports, we saw some notable changes in the type of vulnerability reports submitted. Many categories such as XSS, Injection, and CSRF saw a decrease in reports. Notably, the number of valid CSRF reports for 2016 was zero. At the same time, we saw an increase in session handling bugs, sensitive data exposure, and missing function level access controls.

In April of 2016, we transitioned to the HackerOne platform. The graphs will not include all data for the year, unfortunately, but the data is still interesting. Notably, even though we had run a public bounty for almost 2.4 years, we still experienced a large spike upon announcing the transition, just like many programs’ initial public launch.

We’ve also been able to track new data via the platform. For example, our average response was about 16 hours and our time to resolution was about 28 days. Tracking this data and keeping it within acceptable bounds will help ensure our program continues to run smoothly and efficiently.

It’s more than just a bounty

We continue to see rewards being donated to charities. We absolutely love donating bounties, and we match all contributions. This year saw donations to Doctors without Borders and the Electronic Frontier Foundation (EFF).

We also sponsored an event that was aimed at helping people from under-represented backgrounds participate in bug bounties. The h1-415 event saw attendance from groups like Hack the Hood, Women in Security and Privacy (WISP), FemHacks, Lairon College Cyber Patriots, and /dev/color. Hack the Hood was nice enough to make a video from the event.

In 2016, we’ve learned a lot about running a bug bounty program and we’ve continued to uncover sharp edges of our services and codebase. Our program will continue to evolve to engage and support the community of bounty hunters that have made this program successful. We look forward to your submission in our fourth year of the program!

Best regards and happy hacking,


A formal spec for GitHub Flavored Markdown

  • Mar 14, 2017
  • vmg vmg
  • Engineering

Starting today, all Markdown user content hosted in our website, including user comments, wikis, and .md files in repositories will be parsed and rendered following a formal specification for GitHub Flavored Markdown. We hope that making this spec available will allow third parties to better integrate and preview GFM in their software.

The full details of the specification are available in our Engineering Blog.

This project is based on CommonMark, a joint effort to specify and unify the different Markdown dialects that are currently available. We’ve updated the original CommonMark spec with formal definitions for the custom Markdown features that are commonly used in GitHub, such as tables, task lists, and autolinking.

Together with the specification, we’re also open-sourcing a reference implementation in C, based on the original cmark parser, but with support for all the features that are actively used in GitHub. This is the same implementation we use in our backend.

How will this affect me and my projects?

A lot of care and research has been put into designing the CommonMark spec to make sure it specifies the syntax and semantics of Markdown in a way that represents the existing real-world usage.

Because of this, we expect that the vast majority of Markdown documents on GitHub will not be affected at all by this change. Some documents sporting the most arcane features of Markdown may render differently. Check out our extensive GitHub Engineering blog post for more details on what has changed.

Git LFS 2.0.0 released

Git LFS 2.0 is here Today we’re announcing the next major release of Git LFS: v2.0.0.

The official release notes have the complete list of all the new features, performance improvements, and more. In the meantime, here’s our look at a few of our newest features:

File locking

With Git LFS 2.0.0 you can now lock files that you’re actively working on, preventing others from pushing to the Git LFS server until you unlock the files again.

This will prevent merge conflicts as well as lost work on non-mergeable files at the filesystem level. While it may seem to contradict the distributed and parallel nature of Git, file locking is an important part of many software development workflows—particularly for larger teams working with binary assets.

# This tells LFS to track *.tga files and make them lockable.
# They will appear as read-only on the filesystem.
$ git lfs track "*.tga" --lockable

# To acquire the lock and make foo.tga writeable:
$ git lfs lock foo.tga

# foo.tga is now writeable

$ git add foo.tga
$ git commit ...
$ git push

# Once you're ready to stop work, release the file so others can work on it.
$ git lfs unlock foo.tga

Everything else

Git LFS v2.0.0 also comes with a host of other great features, bug fixes, and other changes.

Transfer queue

Our transfer queue, the mechanism responsible for uploading and downloading files, is faster, more efficient, and more resilient to failure.

To dive in, visit our release notes to learn more.


Git LFS has tremendously improved internals, particularly in Git and filesystem operations. push and pull operations have been optimized to run concurrently with the underlying tree scans necessary to detect LFS objects. Repositories with large trees can begin the push or pull operation immediately, while the tree scan takes place, greatly reducing the amount of time it takes to complete these operations.

The mechanism that scans for files tracked by LFS has been enhanced to ignore directories included in your repository’s .gitignore, improving the efficiency of these operations.

In Git LFS v1.5.0, we introduced the process filter (along with changes in Git v2.11) to dramatically improve performance across multiple platforms, thanks to contributions from @larsxschneider.

Thank you

Since its release, Git LFS has benefited from the contributions of 81 members of the open source community. There have been 1,008 pull-requests, 851 of which have been merged into an official release. Git LFS would not be possible without the gracious efforts of our wonderful contributors. Special thanks to @sinbad who contributed to our work on file locking.

What’s next?

File locking is an early release, so we’re eager to hear your feedback and thoughts on how the feature should work.

In addition, our roadmap is public: comments, questions (and pull requests) are welcomed. To learn more about Git LFS, visit the Git LFS website.

Psst! We also just announced the GitHub plugin for Unity, which brings the GitHub workflow to Unity, including support for Git LFS and file locking. Sign up for early access now.

New and improved two-factor lockout recovery process

Starting January 31, 2017, the Delegated Account Recovery feature will let you associate your GitHub account with your Facebook account, giving you a way back into GitHub in certain two-factor authentication lockout scenarios. If you’ve lost your phone or have otherwise lost the ability to use your phone or token without a usable backup, you can recover your account through Facebook and get back to work. See how the new recovery feature works on the GitHub Engineering Blog.

Image of recovery screen on Facebook

Currently, if you lose the ability to authenticate with your phone or token, you have to prove account ownership before we can disable two-factor authentication. Proving ownership requires access to a confirmed email address and a valid SSH private key for a given account. This feature will provide an alternative proof of account ownership that can be used along with these other methods.

To set up the new recovery option, save a token on the security settings page on GitHub. Then confirm that you’d like store the token. If you get locked out for any reason, you can contact GitHub Support, log in to Facebook, and start the recovery process.

Image of recovery option on GitHub



GitHub Universe logo

GitHub Universe

October 16-17 in San Francisco
Get tickets today

Discover new ways to build better

Try Marketplace apps free for 14 days

Learn more