Engineering


Introducing Experiments, an ongoing research effort from GitHub

Software developers are most productive when software development is inclusive and accessible. At GitHub, we conduct research in machine learning, design, and infrastructure to make sure everyone can do their best work with the next generation of developer tools and workflows.

This research can take considerable time to reach you, our end users, if it reaches you at all. We rigorously evaluate products for stability, performance, and security. And many experiments don’t meet our success criteria for product release, even when they present a path forward for future innovation.

Introducing Experiments

Although we can’t share everything we do, we’ve launched a collection of demonstrations highlighting our most exciting research projects—and the ideas behind them—with Experiments. We hope these will not only give you insight into our research but inspire you to think audaciously about the future of software development.

See our first experiment

For our first demo, we’ve chosen Semantic Code Search. We’ve used machine learning to build semantic representations of code that allow you to use natural language to search for code by intent, rather than just keyword matching. See our blog post for additional detail on how this works.

Semantic Code Search

We’re just getting started, so stay tuned for more examples. If this research excite you as much as they excite us, why not join our team?

Life as a GSoC student: What I learned about open source development through Probot

Abhijeet Pratap Singh is a student at the Indian Institute of Information Technology in Tiruchirappalli, India. He was selected to be one of the Google Summer of Code (GSoC) students for the Probot project. In this post, Abhijeet recounts his experience working with the Probot Team at GitHub and what he learned about working with other developers on an open source project.

This spring, I was selected to participate in Google Summer of Code (GSoC), a program that pairs student developers with open source projects. I was placed with the Probot Team at GitHub. Probot is an open source framework used to build applications for GitHub based on NodeJS. These applications improve and automate your workflow while working on GitHub.

Getting started with Probot

I’m a Computer Science and Engineering undergraduate at the Indian Institute of Information Technology. When I entered the Institute, I was very interested in knowing more about software. I explored different opportunities and found out about GSoC through Quora earlier this year.

I learned about Probot while researching the different projects partnering with GSoC. I had some previous experience working with bots and NodeJS and decided to try out applications built with Probot. I found the Probot community, a Slack community of developers who use Probot, which pointed me to the Probot Summer of Code and Probot’s GSoC project ideas. The project idea list helped me identify what project I should submit a proposal for.

I initially focused on the Twitter integration proposed in Probot’s ideas repo. After a few days of tinkering, I successfully developed my first Twitter integration bot. I recorded a screencast to log my progress with the bot and shared it with Brandon Keepers (@bkeepers), Jason Etcovitch (@JasonEtco), and Gregor Martynus (@gr2m).

I later submitted proposals for the Twitter integration and Weekly Digest projects when the student application period started. My proposal for the Weekly Digest project was accepted, and I became the Student Developer for Probot. The results were announced on April 23. I was so happy to see that my project was accepted and was really excited for this summer. Gregor Martynus (@gr2m) and Wilhelm Klopp (@wilhelmklopp) were assigned as my mentors.

Working on the Weekly Digest project

GSoC kicked off in late April, and I spent more time exploring the community and its best practices. I also met my mentors over video conference. We discussed a plan to move forward with and implement our projects. Then we created the Weekly Digest repository in order to track progress.

The coding round began in mid May. I started by opening a few issues and pull requests, making some commits, and exploring GitHub’s REST API and GitHub’s REST API client for NodeJS. My mentors and I met weekly to go over the work I did and help out whenever I got stuck. It was a great way to discuss best practices, standards, and tools that developers use.

After working on the project, I got very close to a pre-release version of the app and released an alpha version of the Weekly Digest.

Here’s a GIF of the Weekly Digest app in action:

The Weekly Digest provides an overview of activity in your repositories

How the Weekly Digest app works

When you install the Weekly Digest in your (or your organization’s) repository, it curates the following data and publishes it as an issue:

  • Issues created in the last week
    • Opened issues
    • Closed issues
    • Noisy issues
    • Liked issues
  • Pull requests opened, updated, or merged in the last week
    • Opened pull requests
    • Updated pull requests
    • Merged pull requests
  • Commits made in the master branch in the last week
  • Contributors adding contributions in the last week
  • Stargazers, or the fans of your repositories, who really loved your repo
  • Releases of the project you’re working on

The app, as the name suggests, generates these digests and publishes them on a weekly basis, typically on a Sunday. You can change the default configuration of the app by adding a .github/weekly-digest.yml file in your GitHub repository, which allows you to configure the publish date and the specific information included in the digest.

Install the Weekly Digest app

What I gained from this experience

Becoming a software developer has been my dream ever since I was a kid. My mentors were happy to answer all of my questions and took time out of their busy schedules to meet with me. Through GSoC, I learned how a project is maintained and how developers work together to deliver solutions. And I got to develop a love for open source through my work with Probot!

I was really lucky to be guided by my mentors, Gregor Martynus (@gr2m) and Wilhelm Klopp (@wilhelmklopp). Special thanks to them, Brandon Keepers (@bkeepers), Jason Etcovitch (@JasonEtco), and the awesome Probot community for accepting me as a Student Developer for this year’s GSoC!

Learn more about Probot

Using Figma designs to build the Octicons icon library

Recently our friends at Figma announced their new Figma platform, and we’ve been really excited about its potential. We’ve immediately put the platform to use with Octicons, our SVG icon library.

Distributing design assets effectively

Previously, we checked our asset files into the GitHub repository. This workflow was restrictive and confusing for contributors who might want to iterate on or update an Octicon. We wanted anyone to be able to make contributions, but they needed all of these things to work in order to contribute.

  • Specific software: We were keeping icons in software specific to macOS. The contributor needed to have this software installed on the same version pinned in the Octicons repository.
  • Experience with Git: Contributors needed to clone the repository, edit the design asset, and check it back in. This requires knowledge of Git and how to get out of trouble when the binary file won’t merge properly.
  • Prior repository setup: After a contributor made changes they needed to run specialized scripts on their computer to output assets. This required all compatible versions of software installed properly on their laptop.

Figma to the rescue

To support your project’s contributors it’s important to make the contributing experience as frictionless as possible. Migrating our Octicons to Figma let us cut out painful steps in our previous workflow. Having their API available for automating the work has allowed contributors to contribute using powerful platform-agnostic design tools without any overly complex setup.

Getting robots to do the work

Robots are great for doing repeatable tasks, and handing that work off to automated systems frees us up to think about the big picture. We lean on continuous integration to build, export, and distribute the icons.

Continuous integration (CI)

On every pull request we use CI to export our icons from the file and distribute alpha versions of the libraries.

CI on pull requests

Before and after images

We also take advantage of Probot, an out-of-the-box robot that makes automating GitHub tasks easy.

Probot has the ability to check our pull requests on Octicons and look for changes in the Figma source URL. When this occurs, our Probot app will query Figma’s platform and look for changes to any of the icons. When it finds those changes, it will comment on the pull request with before and after images. This makes the process easier for both contributors and maintainers.

Before and after images

Excited about the future

The API’s potential is the most exciting part, and we can’t wait to see how it improves our workflow. The Design Systems Team at GitHub are designers with an engineering focus. We want to keep our components in code, then make them available for our designers to prototype.

The upcoming Figma write API will allow us to maintain our component library in code and export those as Figma components. Using a team library we can publish updates and make them available to the GitHub Product Design Team to use in their design mockups and prototypes.

Four years of the GitHub Security Bug Bounty

Last month GitHub celebrated the fourth year of our Security Bug Bounty program. As we’ve done in the past, we’re sharing some details and highlights from 2017 and looking ahead to where we see the program going in 2018.

2017 in review

Last year was our biggest year yet as our Bug Bounty program continued to grow in participation by researchers, program initiatives, and the rewards paid out.

Diving straight into the numbers, we can review the details of this growth. In 2017, we reviewed and triaged a total of 840 submissions to the program. Of these submissions, we resolved and rewarded a total of 121 reports with an average payout of $1,376 (and swag!). Compared to our previous statistics for 2016, this was a significant increase from 48 out of 795 reports being resolved. In 2017, our rate of valid reports increased from 6% to almost 15%.

Our total payouts also saw a significant increase from $95,300 in 2016 to $166,495 in 2017. We attribute this to the increased number of valid reports and in October we took time to re-evaluate our payout structure. Corresponding with HackerOne’s Hack the World competition, we doubled our payout amounts across the board, bringing our minimum and maximum payouts to $555 and $20,000, bringing our bug bounty in line with the industry’s top programs.

2017 initiatives

To accelerate our program’s growth in 2017, we launched a number of initiatives to help engage researchers. Among the changes to the program was the introduction of GitHub Enterprise to the scope of the Bug Bounty program, which allowed researchers to focus on areas of our applications that may not be exposed on GitHub.com or are specific to certain enterprise deployments. In the beginning of 2017, a number of reports impacting our enterprise authentication methods prompted us to not only focus on this internally, but also identify how we could engage researchers to focus on this functionality. To promote a more targeted review of these critical code paths we kicked off two new initiatives beyond our public Bug Bounty program.

Researcher grants

Providing researcher grants is something that has been on our radar since Google launched their Vulnerability Research Grants in 2015. The basic premise is that we pay a fixed amount to a researcher to dig into a specific feature or area of the application. In addition to the fixed payment for the grant, any vulnerabilities identified would also be paid out through the Bug Bounty program. During the beginning of the year, we identified a researcher with specialty in assessing troublesome enterprise authentication methods. We reached out and launched our first researcher grant. We couldn’t have been happier with the results. It provided a depth of expertise and review that was well worth the extra monetary incentive.

Private bug bounty

In March 2017 we launched GitHub for Business, bringing enterprise authentication to organizations on GitHub.com. We used this feature launch as an opportunity to roll out a new part of the Bug Bounty program: private bug bounties. Through a private program on HackerOne, we reached out to all researchers who had previously participated in our program and allowed them access to this functionality before its public launch. This added to our internal pre-ship security assessments with review by external researchers and helped us identify and remediate issues before general exposure. With the extra review, we were able to limit the impact of vulnerabilities in production while also providing fresh code and functionality for researchers to look into.

Operational efficiency

Internal improvements to the program have helped us more efficiently triage and remediate submissions from researchers. ChatOps and GitHub-based workflows are core to how we deal with incoming submissions. As soon as new ones arrive, we receive alerts in Slack using HackerOne’s Slack integration. From there, we can triage issues directly from chat, letting the team know which issues are critical and which can wait until later. At the end of our triage workflow, we use ChatOps to issue rewards through HackerOne, so we can close the loop and pay researchers as quickly as possible.

To support these workflows, we’ve continued to build on our Ruby on Rails HackerOne API client and extensively use these and GitHub APIs in our internal processes.

So far, these improvements have made us significantly more efficient. Our average response time in 2017 was 10 hours, valid issues were triaged to developers on average within two days, and bounties were rewarded on average in 17 days. Given the time and effort that researchers dedicate to participating in our program, we feel great about these improvements. And in 2018, we’ll continue to refine our process. We’re always looking for ways to make sure our researchers receive a prompt and satisfactory response to their submissions.

What’s next?

Also in 2018, we’re planning to expand the initiatives that proved so successful last year. We’ll be launching more private bounties and research grants to gain focus on specific features both before and after they publicly launch. Later in the year, we’ll announce additional promotions to continue to keep researchers interested and excited to participate.

Given the program’s success, we’re also looking to see how we can expand its scope to help secure our production services and protect GitHub’s ecosystem. We’re excited for what’s next and look forward to triaging and fixing your submissions this year!

Measuring the many sizes of a Git repository

Is your Git repository bursting at the seams? git-sizer is a new open source tool that can tell you when your repo is getting too big. git-sizer computes various Git repository size metrics and alerts you to any that might cause problems or inconvenience.

What is “big”?

When people talk about the size of a Git repository, they often talk about the total size needed by Git to store the project’s history in its internal, highly-compressed format—basically, the amount of disk space used by the .git directory. This number is easy to measure. It’s also useful, because it indicates how long it takes to clone the repository and how much disk space it will use.

At GitHub we host over 78 million Git repositories, so we’ve seen it all. What we find is that many of the repositories that tax our servers the most are not unusually big. The most challenging repositories to host are often those that have an unusual internal layout that Git is not optimized for.

Many properties aside from overall size can make a Git repository unwieldy. For example:

  • It could contain an astronomical number of Git objects (which are used to store the repository’s history)

  • The total size of the Git objects could be huge when uncompressed (even though their size is reasonable when compressed)

  • When the repository is checked out, the size of the working copy might be gigantic

  • The repository could have an unreasonable number of commits in its history

  • It could include enormous individual files or directories

  • It could contain large files/directories that have been modified very many times

  • It could contain too many references (branches, tags, etc)

Any of these properties, if taken to an extreme, can cause certain Git operations to perform poorly. And surprisingly, a repository can be grossly oversized in almost any of these ways without using a worrying amount of disk space.

It also makes sense to consider whether the size of your repository is commensurate with the type and scope of your project. The Linux kernel has been developed over 25 years by thousands of contributors, so it is not at all alarming that it has grown to 1.5 GB. But if your weekend class assignment is already 1.5 GB, that’s probably a strong hint that you could be using Git more effectively!

Sizing up your repository

You can use git-sizer to measure many size-related properties of your repository, including all of those listed above. To do so, you’ll need a local clone of the repository and a copy of the Git command-line client installed and in your execution PATH. Then:

  1. Install git-sizer
  2. Change to the directory containing your repository
  3. Run git-sizer. You can learn about its command-line options by running git-sizer --help, but no options are required

git-sizer will gather statistics about all of the references and reachable Git objects in your repository and output a report. For example, here is the verbose output for the Linux kernel repository:

$ git-sizer --verbose
Processing blobs: 1652370
Processing trees: 3396199
Processing commits: 722647
Matching commits to trees: 722647
Processing annotated tags: 534
Processing references: 539
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |   723 k   | *                              |
|   * Total size               |   525 MiB | **                             |
| * Trees                      |           |                                |
|   * Count                    |  3.40 M   | **                             |
|   * Total size               |  9.00 GiB | ****                           |
|   * Total tree entries       |   264 M   | *****                          |
| * Blobs                      |           |                                |
|   * Count                    |  1.65 M   | *                              |
|   * Total size               |  55.8 GiB | *****                          |
| * Annotated tags             |           |                                |
|   * Count                    |   534     |                                |
| * References                 |           |                                |
|   * Count                    |   539     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  72.7 KiB | *                              |
|   * Maximum parents      [2] |    66     | ******                         |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |  1.68 k   |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |  13.5 MiB | *                              |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |   136 k   |                                |
| * Maximum tag depth      [5] |     1     | *                              |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [6] |  4.38 k   | **                             |
| * Maximum path depth     [7] |    14     | *                              |
| * Maximum path length    [8] |   134 B   | *                              |
| * Number of files        [9] |  62.3 k   | *                              |
| * Total size of files    [9] |   747 MiB |                                |
| * Number of symlinks    [10] |    40     |                                |
| * Number of submodules       |     0     |                                |

[1]  91cc53b0c78596a73fa708cceb7313e7168bb146
[2]  2cde51fbd0f310c8a2c5f977e665c0ac3945b46d
[3]  4f86eed5893207aca2c2da86b35b38f2e1ec1fc8 (refs/heads/master:arch/arm/boot/dts)
[4]  a02b6794337286bc12c907c33d5d75537c240bd0 (refs/heads/master:drivers/gpu/drm/amd/include/asic_reg/vega10/NBIO/nbio_6_1_sh_mask.h)
[5]  5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c (refs/tags/v2.6.11)
[6]  1459754b9d9acc2ffac8525bed6691e15913c6e2 (589b754df3f37ca0a1f96fccde7f91c59266f38a^{tree})
[7]  78a269635e76ed927e17d7883f2d90313570fdbc (dae09011115133666e47c35673c0564b0a702db7^{tree})
[8]  ce5f2e31d3bdc1186041fdfd27a5ac96e728f2c5 (refs/heads/master^{tree})
[9]  532bdadc08402b7a72a4b45a2e02e5c710b7d626 (e9ef1fe312b533592e39cddc1327463c30b0ed8d^{tree})
[10] f29a5ea76884ac37e1197bef1941f62fda3f7b99 (f5308d1b83eba20e69df5e0926ba7257c8dd9074^{tree})

The git-sizer project page explains the output in detail. The most interesting thing to look at is the “level of concern” column, which gives a rough indication of which parameters are high compared with a typical, modest-sized Git repository. A lot of asterisks would suggest that your repository is stretching Git beyond its sweet spot, and that some Git operations might be noticeably slower than usual. If you see exclamation marks instead of asterisks in this column, then you likely have a problem that needs addressing.

As you can see from the output, even though the Linux kernel is a big project by most standards, it is fairly well-balanced and none of its parameters have extreme values. Some Git operations will certainly take longer than they would in a small repository, but not unreasonably, and not out of proportion to the scope of the project. The kernel project is comfortably manageable in Git.

If the git-sizer analysis flags up any problems in your repository, we suggest referring again to the git-sizer project page, where you will find many suggestions and resources for improving the structure of your Git repository. Please note that by far the easiest time to improve your repository structure is when you are just beginning to use Git, for example when migrating a repository from another version control system, before a lot of developers have started cloning and contributing to the repository. And keep in mind that repositories only grow over time, so it is preferable to establish good practices early.

Summary

Git is famous for its speed and ability to deal with even quite large development projects. But every system has its limits, and if you push its limits too hard, your experience might suffer. git-sizer can help you evaluate whether your Git repository will live happily within Git, or whether it would be advisable to slim it down to make your Git experience as delightful as it can be.

Getting involved: git-sizer is open source! If you’d like to report bugs or contribute new features, head over to the project page.

Newer

Changelog

Subscribe

GitHub Universe logo

GitHub Universe

October 16-17 in San Francisco
Get tickets today

Discover new ways to build better

Try Marketplace apps free for 14 days

Learn more