GitHub data is available for public analysis using Google BigQuery, and we’d like to help you take it for a spin.
If you’d like to find out more about what data is available and how it’s been used so far, watch this conversation between GitHub Data Analyst Alyson La and Google Developer Advocate Felipe Hoffa. You’ll learn the story behind the datasets and what types of analysis they make possible. You’ll also see how we’ve visualized data with Tableau and Looker.
There’s a lot of data out there, but it’s all available through BigQuery in two large data sets. The original, community-led GitHub Archive project launched in 2012 and captures almost 30 million events monthly, including issues, commits, and pushes. Last year, we worked with Google to release The GitHub Public Data Set, separate tables with information on all projects that have open source licenses, including commits, file contents, and file paths.
You can also use the GH torrent project to complement the existing datasets with additional metadata.