With GitStats you can pull git data in a uniform way (table format) from GitHub and GitLab. For the time-being you can get data on:
- repositories,
- commits,
- users,
- release logs,
- text files content,
- R package usage.
Installation
From CRAN:
install.packages("GitStats")
From GitHub:
devtools::install_github("r-world-devs/GitStats")
Examples:
Setup your GitStats
:
library(GitStats)
git_stats <- create_gitstats() |>
set_gitlab_host(
repos = "mbtests/gitstatstesting"
) |>
set_github_host(
orgs = "r-world-devs",
repos = "openpharma/DataFakeR"
)
Get commits:
commits <- git_stats |>
get_commits(
since = "2022-01-01"
)
commits
#> # A tibble: 2,178 × 11
#> id committed_date author author_login author_name additions deletions
#> <chr> <dttm> <chr> <chr> <chr> <int> <int>
#> 1 7f48… 2024-09-10 11:12:59 Macie… maciekbanas Maciej Ban… 0 0
#> 2 9c66… 2024-09-10 10:35:37 Macie… maciekbanas Maciej Ban… 0 0
#> 3 fca2… 2024-09-10 10:31:24 Macie… maciekbanas Maciej Ban… 0 0
#> 4 e8f2… 2023-03-30 14:15:33 Macie… maciekbanas Maciej Ban… 1 0
#> 5 7e87… 2023-02-10 09:48:55 Macie… maciekbanas Maciej Ban… 1 1
#> 6 62c4… 2023-02-10 09:17:24 Macie… maciekbanas Maciej Ban… 2 87
#> 7 55cf… 2023-02-10 09:07:54 Macie… maciekbanas Maciej Ban… 92 0
#> 8 C_kw… 2023-05-08 09:43:31 Kryst… krystian8207 Krystian I… 18 0
#> 9 C_kw… 2023-04-28 12:30:40 Kamil… <NA> Kamil Kozi… 18 0
#> 10 C_kw… 2023-03-01 15:05:10 Kryst… krystian8207 Krystian I… 296 153
#> # ℹ 2,168 more rows
#> # ℹ 4 more variables: repository <chr>, organization <chr>, repo_url <chr>,
#> # api_url <glue>
commits |>
get_commits_stats(
time_aggregation = "month",
group_var = author
)
#> # A tibble: 228 × 4
#> stats_date githost author stats
#> <dttm> <chr> <chr> <int>
#> 1 2022-01-01 00:00:00 github Admin_mschuemi 1
#> 2 2022-01-01 00:00:00 github Gowtham Rao 5
#> 3 2022-01-01 00:00:00 github Krystian Igras 1
#> 4 2022-01-01 00:00:00 github Martijn Schuemie 1
#> 5 2022-02-01 00:00:00 github Hadley Wickham 3
#> 6 2022-02-01 00:00:00 github Martijn Schuemie 2
#> 7 2022-02-01 00:00:00 github Maximilian Girlich 13
#> 8 2022-02-01 00:00:00 github Reijo Sund 1
#> 9 2022-02-01 00:00:00 github eitsupi 1
#> 10 2022-03-01 00:00:00 github Maximilian Girlich 14
#> # ℹ 218 more rows
Get repositories:
git_stats |>
get_repos(
with_code = "shiny",
add_contributors = FALSE
)
#> # A tibble: 6 × 16
#> repo_id repo_name organization fullname platform repo_url api_url
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 627452680 hypothesis r-world-devs r-world-d… github https:/… https:…
#> 2 604718884 shinyTimelines r-world-devs r-world-d… github https:/… https:…
#> 3 495151911 shinyCohortBuilder r-world-devs r-world-d… github https:/… https:…
#> 4 495144469 cohortBuilder r-world-devs r-world-d… github https:/… https:…
#> 5 884789327 GitAI r-world-devs r-world-d… github https:/… https:…
#> 6 586903986 GitStats r-world-devs r-world-d… github https:/… https:…
#> # ℹ 9 more variables: created_at <dttm>, last_activity_at <dttm>,
#> # last_activity <drtn>, default_branch <chr>, stars <int>, forks <int>,
#> # languages <chr>, issues_open <int>, issues_closed <int>
Get files:
git_stats |>
get_files(
pattern = "\\.md",
depth = 2L
)
#> # A tibble: 51 × 8
#> repo_name repo_id organization file_path file_content file_size repo_url
#> <chr> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 GitStats Test… gid://… mbtests README.md "# GitStats… 122 https:/…
#> 2 shinyGizmo R_kgDO… r-world-devs NEWS.md "# shinyGiz… 2186 https:/…
#> 3 shinyGizmo R_kgDO… r-world-devs README.md "\n# shinyG… 2337 https:/…
#> 4 shinyGizmo R_kgDO… r-world-devs cran-com… "## Test en… 1700 https:/…
#> 5 cohortBuilder R_kgDO… r-world-devs NEWS.md "# cohortBu… 917 https:/…
#> 6 cohortBuilder R_kgDO… r-world-devs README.md "\n# cohort… 15828 https:/…
#> 7 shinyCohortBu… R_kgDO… r-world-devs NEWS.md "# shinyCoh… 2018 https:/…
#> 8 shinyCohortBu… R_kgDO… r-world-devs README.md "\n# shinyC… 3355 https:/…
#> 9 cohortBuilder… R_kgDO… r-world-devs README.md "\n# cohort… 3472 https:/…
#> 10 GitStats R_kgDO… r-world-devs LICENSE.… "# MIT Lice… 1075 https:/…
#> # ℹ 41 more rows
#> # ℹ 1 more variable: api_url <chr>
Get package usage:
git_stats |>
get_R_package_usage(
packages = c("shiny", "purrr"),
split_output = TRUE
)
#> $shiny
#> # A tibble: 5 × 11
#> package package_usage repo_id repo_fullname repo_name default_branch
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 shiny import 495144469 r-world-devs/cohor… cohortBu… dev
#> 2 shiny import, library 495151911 r-world-devs/shiny… shinyCoh… dev
#> 3 shiny import, library 604718884 r-world-devs/shiny… shinyTim… master
#> 4 shiny import, library 884789327 r-world-devs/GitAI GitAI main
#> 5 shiny import, library 627452680 r-world-devs/hypot… hypothes… master
#> # ℹ 5 more variables: created_at <dttm>, organization <chr>, repo_url <chr>,
#> # api_url <chr>, platform <chr>
#>
#> $purrr
#> # A tibble: 6 × 11
#> package package_usage repo_id repo_fullname repo_name default_branch
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 purrr import 495144469 r-world-devs/cohortB… cohortBu… dev
#> 2 purrr import 495151911 r-world-devs/shinyCo… shinyCoh… dev
#> 3 purrr import 586903986 r-world-devs/GitStats GitStats master
#> 4 purrr import 884789327 r-world-devs/GitAI GitAI main
#> 5 purrr import 627452680 r-world-devs/hypothe… hypothes… master
#> 6 purrr import 402384343 openpharma/DataFakeR DataFakeR master
#> # ℹ 5 more variables: created_at <dttm>, organization <chr>, repo_url <chr>,
#> # api_url <chr>, platform <chr>
#>
#> attr(,"class")
#> [1] "R_package_usage" "list"
#> attr(,"packages")
#> [1] "shiny" "purrr"
#> attr(,"only_loading")
#> [1] FALSE
Print GitStats
to see what it stores:
git_stats
#> A GitStats object for 2 hosts:
#> Hosts: https://gitlab.com/api/v4, https://api.github.com
#> Scanning scope:
#> Organizations: [1] r-world-devs
#> Repositories: [2] mbtests/gitstatstesting, openpharma/DataFakeR
#> Storage:
#> Repositories: 6
#> Commits: 2178 [date range: 2022-01-01 - 2025-01-10]
#> Files: 51 [file pattern: \.md]
#> R_package_usage: 2 [packages: shiny, purrr]
Acknowledgement
Special thanks to James Black, Karolina Marcinkowska, Kamil Koziej, Matt Secrest, Krystian Igras, Kamil Wais, Adam Forys - for the support in the package development.