Changelog
Source:NEWS.md
GitStats 2.3.7
CRAN release: 2025-10-16
Patch release with some improvements and fixes for the process of pulling repositories data when getting commits and files, as well as change of the idea of progress bars, which are now displayed on GitHost level, instead of organization level.
- Handled GitLab GraphQL error for
get_repos_data()method inget_commits()andget_files()with switching to REST API engine (#690). - Introduced caching repositories data, as a bunch of functions for getting commits, files and issues etc. make use of this data (#693).
- Simplified code for handling complexity error when getting files from GitLab repositories (#695).
- Introduced changes to progress bars, most notably moved them to the GitHost level to display high-level progress (#687, #697).
- Made integration tests work for local testing (#595).
GitStats 2.3.6
CRAN release: 2025-09-17
A minor release with some substantial performance improvements on searching repositories by code, new features like filtering repositories data by languages and adding new columns in get_repos() and get_files() output.
- Added
commit_shacolumn toget_repos()andget_files()outputs (#546). - Fixed
depthparameter inget_files()- previously0and1value returned same output, i.e. files fromroot. Now it works the way as explained in function documentation - value0returns files from therootand value1goes 1 level deeper (#663). - Added
languageparameter toget_repos()function to pull repositories only with defined language (#654). For GitHub Search API it translates into language query, whereas in other cases the repositories output is simply filtered by the given language. - Introduced new
repo_fullpathcolumn, which replacedfullnamein output ofget_repos(). Thefullnamecolumn was flawed in case of GitLab repositories as it was created out of repositorynamewithorganization. In case of GitLab repositoryname(which is more of a user friendly label) differs from repositorypath(which is in theURL), unlike in GitHub where repositorynameis repositorypath(#659).repo_namecolumn for GitLab repositories now mirrors repositorypath. - Enhanced
verboserole to control displaying of response error statuses (#669). - Improved code for searching code blobs, so
get_repos()does not fail when user passes text, e.g. with spaces to thewith_codeparameter (#673). - Standardized
repo_idcolumn inget_repos()andget_files()outputs for GitLab hosts - it consists now only of digits formatted as acharacter(#675). - Optimized parsing search response to repositories response (#679).
GitStats 2.3.4
CRAN release: 2025-07-07
- Enabled possibility to set public hosts without specifying
organizationsandrepositoriesscope (#640). This change was motivated by the need to enable the call of functions based on Search API on public hosts (such asget_repos(with_code = {code})), whose performance is acceptable on large public repositories. In the case of other slower functions, users will be informed of the estimated data retrieval time via a progress bar. - Enabled getting repositories trees for whole hosts (#641).
- Removed
get_repos_with_R_packages()function (#644), as it is not in line withGitStatslogic: getting formatted git data from repositories. - Standardized some column names (
repo_nameinstead ofrepository,githostinstead ofplatform) for some tables (issues,files_contentandrepos) (#632).
GitStats 2.3.2
CRAN release: 2025-05-20
- Added
get_repos_trees()function (#614). - Fixed errors when pulling data on repositories with code (#634).
GitStats 2.3.1
CRAN release: 2025-04-25
A patch release with hot fixes to functions pulling repositories with code.
- Fixed pulling GitLab repositories with no issues (#616).
- Handled
GraphQLerrors due to old GitLab version with switching toRESTengine (#615, @ThomUK). - Removed switching to iterating over organizations after large GitLab response error (over 10 thousand responses) (#618), as the process takes very long.
GitStats 2.3.0
CRAN release: 2025-04-03
This release introduces the new functions to get data on organizations and issues, alongside several important fixes and optimizations, such as handling GitLab API limits more efficiently. Additional enhancements include renamed function, added time usage information and shortened data-pulling messages.
- Added
get_orgs()function (#599). - Added
get_issues()andget_issues_stats()functions (#569).
Fixes:
- Fixed
get_files()function when pattern is not defined (#605). Before, function call resulted with empty response or error. - Optimized pulling repositories with code by:
- handling limits of 10 thousand responses on GitLab API (#607),
- switching to
GraphQL APImethods for parsing search responses (still gathered viaREST) into repositories output.
Other:
- Renamed
get_R_package_usage()toget_repos_with_R_package()as the output resembles the one fromget_repos(). - Added information on time used for pulling the data (#603).
- Shortened messages when pulling data from repositories (#587).
GitStats 2.2.2
CRAN release: 2025-02-11
- Fixed pulling repositories URLS by code when
GitStatsis set to scan whole hosts (#589). Settypeparameter toapiby default. Settingtypetowebresults in parsing GitLabapiURLs which may be time consuming and it should not be a default option.
GitStats 2.2.1
CRAN release: 2025-01-21
- Fixed pulling repositories by code when
GitStatsis set to scan whole hosts (#583).
GitStats 2.2.0
CRAN release: 2025-01-14
This release brings some substantial improvements with making it possible to scan whole organizations and particular repositories for one host at the same time, boosting function to prepare commits statistics and simplifying workflow for getting files.
Features:
- From now on it is possible to pass
orgsandreposinset_*_host()functions (#400). - Improved
get_commits_stats()function (#556, #557) with:- giving possibility to customize grouping variable by passing it with the
group_varparameter, - changing name of the
time_intervalparameter totime_aggregation, - adding
yearlyaggregation totime_aggregationparameter, - changing basic input from
GitStatstocommits_dataobject which allows to build workflow in one pipeline (create_gitstats() |> set_*_host() |> get_commits() |> get_commits_stats()).
- giving possibility to customize grouping variable by passing it with the
- Merged two functions
get_files_content()andget_files_structure()into oneget_files()(#564). - Add
.errorparameter to theset_*_host()functions to control if error should pop up when wrong input is passed (#547).
Fixes:
- Fixed pulling commits for GitLab subgroups when repositories are set as scope to scan (#551).
- Filled more information on
author_nameandauthor_loginif it was missing incommits_table(#550). - Handled a
GraphQLresponse error when pulling repositories with R error. Earlier,GitStatsjust returned empty table with no clue on what has happened, as errors fromGraphQLare returned as list outputs (they do not break code). - Fixed getting R package usage when repositories are set (#548).
- Added possibility to pass GitHub users to
orgsparameter inset_github_host()(#562).
GitStats 2.1.2
CRAN release: 2024-11-12
This is a patch release which introduces some hot fixes and new data in get_commits() output.
- Added
repo_urlcolumn to output ofget_commits()function (#535). - Fixed setting default tokens when
verbosemode is set toFALSE(#525) and fixed checking token scopes for GitLab (#526). - Fixed
get_repos_urls()output when individual repositories are set inset_*_host()(#529). Earlier the function pulled all repositories for an organization, even though, repositories were defined for the host, not whole organizations. This is similar to the solved earlier (#439). - Fixed getting GitLab subgroups as organizations in repositories table output when pulling repositories with code (#531).
GitStats 2.1.1
CRAN release: 2024-10-25
This is a patch release which introduces some improvements in get_R_package_usage() on speed and possibility to pull at once data on multiple R packages, new get_storage() function and some fixes for checking token scopes and setting hosts.
Features:
- Optimized
get_R_package_usage()function:- it is now possible to pass a vector of packages names (new
packagesparameter replacing oldpackage_name) (#494), - on the other hand, output of the function has been limited to contain only most necessary data (removing all repository stats), making thus process of obtaining package usage faster (#474).
- new
split_outputparameter has been added - when set toTRUEalistwithtibbles(every element of thelistfor every package) instead of onetibbleis returned.
- it is now possible to pass a vector of packages names (new
- Added possibility to get repositories for individual users with
get_repos()(#492). Earlier this was only possible for GitHub organizations and GitLab groups. - Added new
get_storage()function to retrieve data fromGitStatsobject - whole or particular datasets (e.g.commits,repositoriesorR_package_usage) (#509).
Fixes:
- Fixed getting large search responses for GitHub (#491).
- Fixed checking token scopes (#501). If token scopes are insufficient error is returned and
GitHostis not passed toGitStats. This also applies to situation whenGitStatslooks for default tokens (not defined by user). Earlier, if tests for token failed, an empty token was passed andGitStatswas created, which was misleading for the user. - It is now possible to pass public GitHub host name (
github.comorhttps://github.com) toset_github_host()(#475). - It is also possible to pass hosts in more flexible way than before (e.g.
{host_url},http://{host_url}orhttps://{host_url}) tohostparameter in `set_*_host() function (#399).
GitStats 2.1.0
This minor release comes up with new get_files_structure() function and adjustments to get_files_content() so user can pull custom (by defining pattern of files and depth of directories) files tree from repository and pull their content.
New features:
- Added new
get_files_structure()function to pull files structure for a given repository with possibility to control level of directories (depthparameter) and to limit output to files matching regex argument passed topatternparameter (#338). Together with that,get_files()function was renamed toget_files_content()to better reflect its purpose. - Adjusted
get_files_content()so it can make use offiles_structurepulled toGitStatsstorage withget_files_structure()function - iffile_pathis set toNULLanduse_files_structure()parameter toTRUE(both are by default)(#467). - Added
progressparameter to user functions to control showing ofcliprogress bar separately from messages (which are controlled withverbose) (#465).
GitStats 2.0.2
This is a patch release with substantial improvements to some functions (get_repos(), get_files() and get_R_package_usage()), adding with_files and in_files parameters, fixing cache feature and introducing new get_repos_urls() function, a minimalist version of get_repos():
- Added new
get_repos_urls()function to fetch repository URLs (either web or API - choose withtypeparameter). It may return also only these repository URLs that consist of a given file or files (with passing argument towith_filesparameter) or a text in code blobs (with_codeparameter). This is a minimalist version ofget_repos(), which takes out all the process of parsing (search response into repositories one) and adding statistics on repositories. This makes it poorer with content but faster. (#425). - Added
with_filesparameter toget_repos()function, which makes it possible to search for repositories with a given file or files and return full output for repositories. - It is also possible now to pass multiple code phrases to
with_codeparameter (as a character vector) inget_repos()andget_repos_urls()(282). - Added
in_filesparameter toget_repos()which works withwith_codeparameter. When both are defined,GitStatssearches code blobs only in given files. - Removed
dplyr::glimpse()fromget_*()functions, so there is printing to console only ifget_*()function is not assigned to the object (#426). - Output table of
get_R_package_usage()consists now also of repository full name (#438). - Improved
get_R_package_usage()with optimizing search of package names inDESCRIPTIONandNAMESPACEfiles by removing filtering method and replacing it withfilename:filter directly in search endpoint query (#428). - Fixed
get_files()when scanning scope is set torepositories. Earlier, it pulled given files from whole organizations, even if scanning scope was set toreposwithset_*_host(). Now it shows only files for the given repositories (#439). - Improved cache feature (#436).
-
verboseparameter controls now showing of the progress bars (#453).
GitStats 2.0.1
This is a patch release with some hot issues that needed to be addressed, notably covering set_*_host() functions with verbose control, tweaking a bit verbose feature in general, fixing pulling data for GitLab subgroups and speeding up get_files() function.
Features:
- Getting files feature has been sped up when
GitStatsis set to scan whole hosts, with switching toSearch APIinstead of pulling files viaGraphQL(with iteration over organizations and repositories) (#411). - When setting hosts to be scanned in whole (without specifying
orgsorrepos) GitStats does not pull no more all organizations. Pulling all organizations from host is triggered only when user decides to pull repositories from organizations. If he decides, e.g. to pull repositories by code, there is no need to pull all organizations (which may be a time consuming process), as GitStats uses thenSearch API(#393). - It is now possible to mute messages also from
set_*_host()functions withverbose_off()orverboseparameter (#413). - Setting
verbosetoFALSEdoes not lead to hiding output of theget_*()functions - i.e. a glimpse of table will always appear after pulling data, even if theverboseis switched off.verboseparameter serves now only the purpose to show and hide messages to user (#423).
Fixes:
- Pulling repositories from GitLab subgroups was fixed. It did not work, as the URL of a group (org) was passed to GraphQL API the same way as to REST API, i.e. with URL sign (“%2F”, instead of “/”).
- GitStats returns now proper error, when you pass wrong host URL to
set_*_host()function (#415)
GitStats 2.0.0
This is a major release with general changes in workflow (simplifying it), changes in setting GitStats hosts, deprecation of some not very useful features (like plots, setting parameters separately) and new get_release_logs() function.
Setting hosts:
-
set_host()function is replaced with more explicitset_github_host()andset_gitlab_host()(#373). If you wish to connect to public host (e.g.api.github.com), you do not need to pass argument tohostparameter.
Simplifying workflow:
- GitStats workflow is now simplified. To pull data on
repositories,commits,R_package_usageor other you should use directly correspondingget_*()functions instead ofpull_*()which are deprecated. Theseget_*()functions pull data from API, parse it into table, add some goodies (additional columns) if needed and return table instead ofGitStatsobject, which in our opinion is more intuitive and user-friendly (#345). That means you do not need to run in pipe two or three additional function calls as before, e.g.pull_repos(gitstats_object) %>% get_repos() %>% get_repos_stats(), but you just runget_repos(gitstats_object)to get data you need. - Moreover, if you run for the second time
get_*()functionGitStatswill pull the data from its storage and not from API as for the first time, unless you change parameters for the function (e.g. starting date withsinceinget_commits()) or change directly thecacheparameter in the function. (#333) -
pull_repos_contributors()as a separate function is deprecated. The parameteradd_contributorsis now set by default toTRUEinget_repos()which seems more reasonable as user gets all the data. - In
get_commits()old parameters (date_fromanddate_until) were replaced with new, more concise (sinceanduntil).
Changes to setting parameters and pulling repositories by code:
-
set_params()function is removed. (#386) Now the logic is moved straight toget_*()functions. For example, if you want to pull repositories with specificcode blob, you do not need to define anything withset_params()(as previously withsearch_modeandphraseparameter) but you just simply runget_repos(with_code = 'your_code'). (#333) - New logical parameter
verbosehave been introduced for limiting messages to user when pulling data - this parameter can be set in allget_*()functions. You can also turn the verbose mode on/off globally withverbose_on()/verbose_off()functions.
Deprecate:
-
get_repos_stats()function was deprecated as its role was unclear - unlikeget_commit_stats()it did not aggregate repositories data into new stats table, but added only some new numeric columns, like number of contributors (contributors_n) or last activity indifftimeformat, which is now done withinget_repos()function. - Pulling by
teamand filtering bylanguageis no longer supported - these features where quite heavy for the package performance and did not bring much added value. If user needs, he can always filter the output (formatted responses pulled from API) by contributors or language. (#384) - Plot functions are no longer feature of
GitStats, they have been deprecated as the package is meant to be basically for back end purposes and this is the field where developer’s effort should now go (#381). If needed and requested, plot functions may be brought up once more in next releases.
New features:
- Added
get_release_logs()(#356). -
get_orgs()is renamed toshow_orgs()to reflect that it does not pull data from API, but only shows what is inGitStatsobject. - Commits response consists now of two new columns:
author_loginandauthor_name(#332). This is due to the mix of GitHub/GitLab handles and display names in theauthorcolumn (the original authornamefield in commits API response). - Improve printing
GitStatsobject - now when you returnGitStatsobject in console, it printsGitStatsdata divided into sections to give more readable information to user:scanning scope(organizations and repositories), andstorage(the output tables stored inGitStatswith basic information on dimensions) (#329).
Bug fixes:
- Pagination was introduced to
contributorsresponse (#331). - Fixed handler of dates parameters when pulling commits. Wrong and complex construction of
gts_to_posixt()helper which took dependencies onstringrwas a cause for some users of passing empty value tosinceparameter to commits endpoint which ended in Bad Request Error (400) and infinite loop of retrying the response (#360).
GitStats 1.1.0
New features:
-
pull_R_package_usage()withget_R_package_usage()functions to pull repositories where package name is found in DESCRIPTION or NAMESPACE files or code blobs with phrases related to using an R package (library(package),require(package)) (#326, #341), -
pull_files()withget_files()to pull content of text files (#200). - possibility to pass specific repositories to
GitStatswithset_host()function by usingreposparameter instead oforgs(#330).
Minor changes and features:
- rename column names for repository output -
idtorepo_idandnametorepo_name, - added a
default_branchcolumn to repositories output as a consequence of #200.
GitStats 1.0.0
Major changes:
New features:
- added setting tokens by default - if the user does have all the PATs set up in environment variables (as e.g.
GITHUB_PATorGITLAB_PAT), there is no need to pass them as an argument toset_host()(#120), - added
pull_users()function to pull information on users (#199), - added possibility of scanning whole internal git platforms if no
orgsare passed (#258), - added
get_orgs()function to print all organizations (#283), - added resetting all settings to default with
reset()function (#270) - added resetting language in your search preferences with
reset_language()or settinglanguageparameter toAllinsetup()function (#231)
Improving performance with REST and GraphQL APIs:
- added switching to REST engine in case GraphQL fails with 502 error (#225)
- added GraphQL engine for getting GitLab repositories by organization (#218)
- removed
contributorsas basic stat when pullingreposbyorgand byphraseto improve speed of pulling repositories data. Addedpull_repos_contributors()user function andadd_contributorsparameter topull_repos()function to add conditionally information on contributors to repositories table (#235)
GitStats 0.1.0
This is the first release of GitStats with given features:
-
create_gitstats()- creating GitStats object, -
set_connection()- adding hosts to GitStats object, -
setup()- setting search parameter to org, team or phrase, setting programming language of repositories, -
get_repos()- pulling repositories from GitHub and GitLab API in a standardized table, -
get_commits()- pulling commits from GitHub and GitLab API in a standardized table, -
set_team_member()- adding team members to GitStats object.