Back to GitHub Support Contact GitHub

Engineering - page 4

Integrations moves into pre-release with new features

Last September, we launched the Early Access Program of GitHub Integrations, a new option for extending GitHub. We’ve recently added some new features and moved Integrations into pre-release so you can begin using it within your production workflows. Here’s a summary of the latest features. You can learn more about what’s changed from our Developer Blog.

Enabling users to log in from your Integration

Users can now log in with your Integration using the OAuth protocol, allowing you to identify users and display data to them from the relevant installations. Additionally, an Integration can now make authorized API requests on behalf of a user; for example, to deploy code or create an issue. Learn more about authenticating as a user via an Integration.

Updating an Integration’s permissions

When you create an Integration, you have to specify which permissions it needs; for example, the ability to read issues or create deployments. Now you can update the requested permissions via Settings > Developer settings > Integrations, whenever the needs of your Integration change. Users will be prompted to accept these changes and enable the new permissions on their installation.

Post-installation setup

Finally, you now have the option to configure a Setup URL to which you can redirect users after they install your integration if any additional setup is required on your end.


More information on getting started can be found on our Developer Blog and in our documentation. We’d also love to hear what you think. Talk to us in the new Platform forum.

Git LFS 2.1.0 released

Today we’re announcing the next major release of Git LFS: v2.1.0, including new features, performance improvements, and more.

New features

Status subcommand

With Git LFS 2.1.0, get a more comprehensive look at which files are marked as modified by running the git lfs status command.

Git LFS will now tell you if your large files are being tracked by LFS, stored by Git, or a combination of both. For instance, if LFS sees a file that is stored as a large object in Git, it will convert it to an LFS pointer on checkout which will mark the file as modified. To diagnose this, try git lfs status for a look at what’s going on:

On branch master

Git LFS objects to be committed:

converted-lfs-file.dat (Git: acfe112 -> LFS: acfe112)
new-lfs-file.dat (LFS: eeaf82b)
partially-staged-lfs-file.dat (LFS: a12942e)

Git LFS objects not staged for commit:

unstaged-lfs-file.dat (LFS: 1307990 -> File: 8735179)
partially-staged-lfs-file.dat (File: 0dc8537)

Per-server configuration

Git LFS 2.1.0 introduces support for URL-style configuration via your .gitconfig or .lfsconfig. For settings that apply to URLs, like http.sslCert or lfs.locksverify, you can now scope them to a top-level domain, a root path, or just about anything else.

Network debugging tools

To better understand and debug network requests made by Git LFS, version 2.1.0 introduces a detailed view via the GIT_LOG_STATS=1 environment variable:

$ GIT_LOG_STATS=1 git lfs pull
Git LFS: (201 of 201 files) 1.17 GB / 1.17 GB

$ cat .git/lfs/objects/logs/http/*.log
concurrent=15 time=1493330448 version=git-lfs/2.1.0 (GitHub; darwin amd64; go 1.8.1; git f9d0c49e)
key=lfs.batch event=request url= method=POST body=8468
key=lfs.batch event=request url= method=POST status=200 body=59345 conntime=47448309 dnstime=2222176 tlstime=163385183 time=491626933 event=request url= method=GET status=200 body=4 conntime=58735013 dnstime=6486563 tlstime=120484526 time=258568133 event=request url= method=GET status=200 body=4 conntime=58231460 dnstime=6417657 tlstime=122486379 time=261003305

# ...

Relative object expiration

The Git LFS API has long supported an expires_at property in both SSH authenticate as well as Batch API responses. This introduced a number of issues where an out-of-sync system clock would cause LFS to think that objects were expired when they were still valid. Git LFS 2.1.0 now supports an expires_in property to specify a duration relative to your computer’s time to expire the object.

What’s next?

The LFS team is working on a migration tool to easily migrate your existing Git repositories with large objects into LFS without the need to write a git filter-branch command. We’re also still inviting your feedback on our File Locking feature.

In addition, our roadmap is public: comments, questions, and pull requests are welcomed. To learn more about Git LFS, visit the Git LFS website.

That was a quick overview of some of the larger changes included in this release. To get a more detailed look, check out the release notes.

SHA-1 collision detection on

  • Mar 20, 2017
  • peff peff
  • Engineering

A few weeks ago, researchers announced SHAttered, the first collision of the SHA-1 hash function. Starting today, all SHA-1 computations on will detect and reject any Git content that shows evidence of being part of a collision attack. This ensures that GitHub cannot be used as a platform for performing collision attacks against our users.

This fix will also be included in the next patch releases for the supported versions of GitHub Enterprise.

Why does SHA-1 matter to Git?

Git stores all data in “objects.” Each object is named after the SHA-1 hash of its contents, and objects refer to each other by their SHA-1 hashes. If two distinct objects have the same hash, this is known as a collision. Git can only store one half of the colliding pair, and when following a link from one object to the colliding hash name, it can’t know which object the name was meant to point to.

Two objects colliding accidentally is exceedingly unlikely. If you had five million programmers each generating one commit per second, your chances of generating a single accidental collision before the Sun turns into a red giant and engulfs the Earth is about 50%.

Why do collisions matter for Git’s security?

If a Git fetch or push tries to send a colliding object to a repository that already contains the other half of the collision, the receiver can compare the bytes of each object, notice the problem, and reject the new object. Git has implemented this detection since its inception.

However, SHA-1 names can be assigned trust through various mechanisms. For instance, Git allows you to cryptographically sign a commit or tag. Doing so signs only the commit or tag object itself, which in turn points to other objects containing the actual file data by using their SHA-1 names. A collision in those objects could produce a signature which appears valid, but which points to different data than the signer intended. In such an attack the signer only sees one half of the collision, and the victim sees the other half.

What would a collision attack against Git look like?

The recent attack cannot generate a collision against an existing object. It can only generate a colliding pair from scratch, where the two halves of the pair are similar but contain a small section of carefully-selected random data that differs.

An attack therefore would look something like this:

  1. Generate a colliding pair, where one half looks innocent and the other does something malicious. This is best done with binary files where humans are unlikely to notice the difference between the two halves (the recent attack used PDFs for this purpose).

  2. Convince a project to accept your innocent half, and wait for them to sign a tag or commit that contains it.

  3. Distribute a copy of the repository with the malicious half (either by breaking into a hosting server and replacing the innocent object on disk, or hosting it elsewhere and asking people to verify its integrity based on the signatures). Anybody verifying the signature will think the contents match what the project owners signed.

How is GitHub protecting against collision attacks?

Generating a collision via brute-force is computationally too expensive, and will remain so for the foreseeable future. The recent attack uses special techniques to exploit weaknesses in the SHA-1 algorithm that find a collision in much less time. These techniques leave a pattern in the bytes which can be detected when computing the SHA-1 of either half of a colliding pair. now performs this detection for each SHA-1 it computes, and aborts the operation if there is evidence that the object is half of a colliding pair. That prevents attackers from using GitHub to convince a project to accept the “innocent” half of their collision, as well as preventing them from hosting the malicious half.

The actual detection code is open-source and was written by Marc Stevens (whose work is the basis of the SHAttered attack) and Dan Shumow. We are grateful for their work on that project.

Are there Git collisions?

Not yet. Git’s object names take into account not only the raw bytes of the files, but also some Git-specific header information. The PDFs provided by the SHAttered researchers collide in their raw bytes, but not when added to a Git repository. The same technique could be used to generate a Git object collision, but like the generation of the original SHAttered PDFs, it would require spending hundreds of thousands of dollars in computation.

What future work is there?

Blocking collisions that pass through GitHub is only the first step. We’ve already been working with the Git project to include the collision detection library upstream. Future versions of Git will be able to detect and reject colliding halves no matter how they reach the developer: fetching from other hosting sites, applying patches, or generating objects from local data.

The Git project is also developing a plan to transition away from SHA-1 to another, more secure hash algorithm, while minimizing the disruption to existing repository data. As that work matures, we plan to support it on GitHub.

Bug Bounty third anniversary wrap-up

Wrapping up GitHub Bug Bounty Third Year Anniversary Promotion

In honor of our Bug Bounty Program’s third birthday, we kicked off a promotional bounty period in January and February. In addition to bonus payouts, the scope of the bug bounty was expanded to include GitHub Enterprise. It may come as no surprise that including a new scope meant that the most severe bugs were all related to the newly included target.

There was no shortage of high-quality reports. Picking winners is always tough, but below are the intrepid researchers receiving extra bounties.

First prize

The first prize bonus of $12,000 goes to @jkakavas for their GitHub Enterprise SAML authentication bypass report which allowed an attacker to construct a SAML response that could arbitrarily set the authenticated user account. You can read more about the story on their blog.

Second prize

The second prize bonus of $8,000 goes to @iblue for their remote code execution bug found in the GitHub Enterprise management console. This was due to a static secret mistakenly being used to cryptographically sign the session cookie. The static secret was intended to only be used for testing and development. However, an unrelated change of file permissions prevented the intended (and randomly generated) session secret from being used. By knowing this secret, an attacker could forge a cookie that is deserialized by Marshal.load, leading to remote code execution.

Third prize

The third prize bonus of $5,000 goes to @soby for their report of another GitHub Enterprise SAML authentication bypass. This attack showed it was possible to replay a SAML response to have our SAML implementation use unsigned data in determining which user account was authenticated.

Best report

The best report bonus of $5,000 goes to @orangetw for their report of Server-Side Request Forgery. This report involved chaining together four vulnerabilities to deliver requests to internal services that end up executing attacker-controlled code. The reporter supplied a clear explanation of the problem through each step and included proof-of-concept scripts for the entire journey. While this report did not earn a prize for being the most severe, it is exactly the type of report we want to encourage reporters to submit.

Other notable reports

We received a wonderful Christmas gift from @orangetw with a SQL Injection bug on GitHub Enterprise. You can read more about how they learned Rails in three days before finding the bug.

A theme we’ve seen continue over the years is paying out for bugs that aren’t in our own code but in browsers. 2016 was no exception with @filedescriptor’s report of funky Internet Explorer behavior detailing how a triple-encoded host value in a URL is handled in redirects.

Lastly, Unicode gonna Unicode. @jagracy found a way to exploit the way Unicode was normalized in our code and in our database engine to deliver password reset emails to entirely different addresses than what were intended.

Some statistics

Our “Two Years of Bounties” post has detailed stats for our submissions during the first two years of the program. These years saw a total payout of $95,300 across 102 submissions out of a total of 7,050 submissions (1.4% validity rate). So far in the third year of the program, we have paid out for 73 submissions for a total of $81,700. Many of these reports fall into our “$200 thank you” bucket. These are issues that we do not consider severe enough for an immediate fix, but that we still want to reward our researchers for. Forty-eight of these issues were deemed high enough risk to warrant a write-up on We saw a total of 48 out of 795 valid reports, bringing our validity rate to 6%.

In 2016, we saw a slight decrease in the number of reports compared to the average of the previous two years. While we can’t know for certain, we suspect this is due to the ever-decreasing presence of “low hanging fruit.” Unfortunately, we saw an increase in the number of “critical” reports. All other categories saw a decrease in the number of reports.

Out of the accepted reports, we saw some notable changes in the type of vulnerability reports submitted. Many categories such as XSS, Injection, and CSRF saw a decrease in reports. Notably, the number of valid CSRF reports for 2016 was zero. At the same time, we saw an increase in session handling bugs, sensitive data exposure, and missing function level access controls.

In April of 2016, we transitioned to the HackerOne platform. The graphs will not include all data for the year, unfortunately, but the data is still interesting. Notably, even though we had run a public bounty for almost 2.4 years, we still experienced a large spike upon announcing the transition, just like many programs’ initial public launch.

We’ve also been able to track new data via the platform. For example, our average response was about 16 hours and our time to resolution was about 28 days. Tracking this data and keeping it within acceptable bounds will help ensure our program continues to run smoothly and efficiently.

It’s more than just a bounty

We continue to see rewards being donated to charities. We absolutely love donating bounties, and we match all contributions. This year saw donations to Doctors without Borders and the Electronic Frontier Foundation (EFF).

We also sponsored an event that was aimed at helping people from under-represented backgrounds participate in bug bounties. The h1-415 event saw attendance from groups like Hack the Hood, Women in Security and Privacy (WISP), FemHacks, Lairon College Cyber Patriots, and /dev/color. Hack the Hood was nice enough to make a video from the event.

In 2016, we’ve learned a lot about running a bug bounty program and we’ve continued to uncover sharp edges of our services and codebase. Our program will continue to evolve to engage and support the community of bounty hunters that have made this program successful. We look forward to your submission in our fourth year of the program!

Best regards and happy hacking,


A formal spec for GitHub Flavored Markdown

  • Mar 14, 2017
  • vmg vmg
  • Engineering

Starting today, all Markdown user content hosted in our website, including user comments, wikis, and .md files in repositories will be parsed and rendered following a formal specification for GitHub Flavored Markdown. We hope that making this spec available will allow third parties to better integrate and preview GFM in their software.

The full details of the specification are available in our Engineering Blog.

This project is based on CommonMark, a joint effort to specify and unify the different Markdown dialects that are currently available. We’ve updated the original CommonMark spec with formal definitions for the custom Markdown features that are commonly used in GitHub, such as tables, task lists, and autolinking.

Together with the specification, we’re also open-sourcing a reference implementation in C, based on the original cmark parser, but with support for all the features that are actively used in GitHub. This is the same implementation we use in our backend.

How will this affect me and my projects?

A lot of care and research has been put into designing the CommonMark spec to make sure it specifies the syntax and semantics of Markdown in a way that represents the existing real-world usage.

Because of this, we expect that the vast majority of Markdown documents on GitHub will not be affected at all by this change. Some documents sporting the most arcane features of Markdown may render differently. Check out our extensive GitHub Engineering blog post for more details on what has changed.


Discover new ways to build better

Try Marketplace apps free for 14 days

Learn more