Changelog
Source:NEWS.md
GitStats 2.2.1
- Fixed pulling repositories by code when
GitStats
is set to scan whole hosts (#583).
GitStats 2.2.0
CRAN release: 2025-01-14
This release brings some substantial improvements with making it possible to scan whole organizations and particular repositories for one host at the same time, boosting function to prepare commits statistics and simplifying workflow for getting files.
Features:
- From now on it is possible to pass
orgs
andrepos
inset_*_host()
functions (#400). - Improved
get_commits_stats()
function (#556, #557) with:- giving possibility to customize grouping variable by passing it with the
group_var
parameter, - changing name of the
time_interval
parameter totime_aggregation
, - adding
yearly
aggregation totime_aggregation
parameter, - changing basic input from
GitStats
tocommits_data
object which allows to build workflow in one pipeline (create_gitstats() |> set_*_host() |> get_commits() |> get_commits_stats()
).
- giving possibility to customize grouping variable by passing it with the
- Merged two functions
get_files_content()
andget_files_structure()
into oneget_files()
(#564). - Add
.error
parameter to theset_*_host()
functions to control if error should pop up when wrong input is passed (#547).
Fixes:
- Fixed pulling commits for GitLab subgroups when repositories are set as scope to scan (#551).
- Filled more information on
author_name
andauthor_login
if it was missing incommits_table
(#550). - Handled a
GraphQL
response error when pulling repositories with R error. Earlier,GitStats
just returned empty table with no clue on what has happened, as errors fromGraphQL
are returned as list outputs (they do not break code). - Fixed getting R package usage when repositories are set (#548).
- Added possibility to pass GitHub users to
orgs
parameter inset_github_host()
(#562).
GitStats 2.1.2
CRAN release: 2024-11-12
This is a patch release which introduces some hot fixes and new data in get_commits()
output.
- Added
repo_url
column to output ofget_commits()
function (#535). - Fixed setting default tokens when
verbose
mode is set toFALSE
(#525) and fixed checking token scopes for GitLab (#526). - Fixed
get_repos_urls()
output when individual repositories are set inset_*_host()
(#529). Earlier the function pulled all repositories for an organization, even though, repositories were defined for the host, not whole organizations. This is similar to the solved earlier (#439). - Fixed getting GitLab subgroups as organizations in repositories table output when pulling repositories with code (#531).
GitStats 2.1.1
CRAN release: 2024-10-25
This is a patch release which introduces some improvements in get_R_package_usage()
on speed and possibility to pull at once data on multiple R packages, new get_storage()
function and some fixes for checking token scopes and setting hosts.
Features:
- Optimized
get_R_package_usage()
function:- it is now possible to pass a vector of packages names (new
packages
parameter replacing oldpackage_name
) (#494), - on the other hand, output of the function has been limited to contain only most necessary data (removing all repository stats), making thus process of obtaining package usage faster (#474).
- new
split_output
parameter has been added - when set toTRUE
alist
withtibbles
(every element of thelist
for every package) instead of onetibble
is returned.
- it is now possible to pass a vector of packages names (new
- Added possibility to get repositories for individual users with
get_repos()
(#492). Earlier this was only possible for GitHub organizations and GitLab groups. - Added new
get_storage()
function to retrieve data fromGitStats
object - whole or particular datasets (e.g.commits
,repositories
orR_package_usage
) (#509).
Fixes:
- Fixed getting large search responses for GitHub (#491).
- Fixed checking token scopes (#501). If token scopes are insufficient error is returned and
GitHost
is not passed toGitStats
. This also applies to situation whenGitStats
looks for default tokens (not defined by user). Earlier, if tests for token failed, an empty token was passed andGitStats
was created, which was misleading for the user. - It is now possible to pass public GitHub host name (
github.com
orhttps://github.com
) toset_github_host()
(#475). - It is also possible to pass hosts in more flexible way than before (e.g.
{host_url}
,http://{host_url}
orhttps://{host_url}
) tohost
parameter in `set_*_host() function (#399).
GitStats 2.1.0
This minor release comes up with new get_files_structure()
function and adjustments to get_files_content()
so user can pull custom (by defining pattern of files and depth of directories) files tree from repository and pull their content.
New features:
- Added new
get_files_structure()
function to pull files structure for a given repository with possibility to control level of directories (depth
parameter) and to limit output to files matching regex argument passed topattern
parameter (#338). Together with that,get_files()
function was renamed toget_files_content()
to better reflect its purpose. - Adjusted
get_files_content()
so it can make use offiles_structure
pulled toGitStats
storage withget_files_structure()
function - iffile_path
is set toNULL
anduse_files_structure()
parameter toTRUE
(both are by default)(#467). - Added
progress
parameter to user functions to control showing ofcli
progress bar separately from messages (which are controlled withverbose
) (#465).
GitStats 2.0.2
This is a patch release with substantial improvements to some functions (get_repos()
, get_files()
and get_R_package_usage()
), adding with_files
and in_files
parameters, fixing cache
feature and introducing new get_repos_urls()
function, a minimalist version of get_repos()
:
- Added new
get_repos_urls()
function to fetch repository URLs (either web or API - choose withtype
parameter). It may return also only these repository URLs that consist of a given file or files (with passing argument towith_files
parameter) or a text in code blobs (with_code
parameter). This is a minimalist version ofget_repos()
, which takes out all the process of parsing (search response into repositories one) and adding statistics on repositories. This makes it poorer with content but faster. (#425). - Added
with_files
parameter toget_repos()
function, which makes it possible to search for repositories with a given file or files and return full output for repositories. - It is also possible now to pass multiple code phrases to
with_code
parameter (as a character vector) inget_repos()
andget_repos_urls()
(282). - Added
in_files
parameter toget_repos()
which works withwith_code
parameter. When both are defined,GitStats
searches code blobs only in given files. - Removed
dplyr::glimpse()
fromget_*()
functions, so there is printing to console only ifget_*()
function is not assigned to the object (#426). - Output table of
get_R_package_usage()
consists now also of repository full name (#438). - Improved
get_R_package_usage()
with optimizing search of package names inDESCRIPTION
andNAMESPACE
files by removing filtering method and replacing it withfilename:
filter directly in search endpoint query (#428). - Fixed
get_files()
when scanning scope is set torepositories
. Earlier, it pulled given files from whole organizations, even if scanning scope was set torepos
withset_*_host()
. Now it shows only files for the given repositories (#439). - Improved cache feature (#436).
-
verbose
parameter controls now showing of the progress bars (#453).
GitStats 2.0.1
This is a patch release with some hot issues that needed to be addressed, notably covering set_*_host()
functions with verbose
control, tweaking a bit verbose
feature in general, fixing pulling data for GitLab subgroups and speeding up get_files()
function.
Features:
- Getting files feature has been sped up when
GitStats
is set to scan whole hosts, with switching toSearch API
instead of pulling files viaGraphQL
(with iteration over organizations and repositories) (#411). - When setting hosts to be scanned in whole (without specifying
orgs
orrepos
) GitStats does not pull no more all organizations. Pulling all organizations from host is triggered only when user decides to pull repositories from organizations. If he decides, e.g. to pull repositories by code, there is no need to pull all organizations (which may be a time consuming process), as GitStats uses thenSearch API
(#393). - It is now possible to mute messages also from
set_*_host()
functions withverbose_off()
orverbose
parameter (#413). - Setting
verbose
toFALSE
does not lead to hiding output of theget_*()
functions - i.e. a glimpse of table will always appear after pulling data, even if theverbose
is switched off.verbose
parameter serves now only the purpose to show and hide messages to user (#423).
Fixes:
- Pulling repositories from GitLab subgroups was fixed. It did not work, as the URL of a group (org) was passed to GraphQL API the same way as to REST API, i.e. with URL sign (“%2F”, instead of “/”).
- GitStats returns now proper error, when you pass wrong host URL to
set_*_host()
function (#415)
GitStats 2.0.0
This is a major release with general changes in workflow (simplifying it), changes in setting GitStats
hosts, deprecation of some not very useful features (like plots, setting parameters separately) and new get_release_logs()
function.
Setting hosts:
-
set_host()
function is replaced with more explicitset_github_host()
andset_gitlab_host()
(#373). If you wish to connect to public host (e.g.api.github.com
), you do not need to pass argument tohost
parameter.
Simplifying workflow:
- GitStats workflow is now simplified. To pull data on
repositories
,commits
,R_package_usage
or other you should use directly correspondingget_*()
functions instead ofpull_*()
which are deprecated. Theseget_*()
functions pull data from API, parse it into table, add some goodies (additional columns) if needed and return table instead ofGitStats
object, which in our opinion is more intuitive and user-friendly (#345). That means you do not need to run in pipe two or three additional function calls as before, e.g.pull_repos(gitstats_object) %>% get_repos() %>% get_repos_stats()
, but you just runget_repos(gitstats_object)
to get data you need. - Moreover, if you run for the second time
get_*()
functionGitStats
will pull the data from its storage and not from API as for the first time, unless you change parameters for the function (e.g. starting date withsince
inget_commits()
) or change directly thecache
parameter in the function. (#333) -
pull_repos_contributors()
as a separate function is deprecated. The parameteradd_contributors
is now set by default toTRUE
inget_repos()
which seems more reasonable as user gets all the data. - In
get_commits()
old parameters (date_from
anddate_until
) were replaced with new, more concise (since
anduntil
).
Changes to setting parameters and pulling repositories by code:
-
set_params()
function is removed. (#386) Now the logic is moved straight toget_*()
functions. For example, if you want to pull repositories with specificcode blob
, you do not need to define anything withset_params()
(as previously withsearch_mode
andphrase
parameter) but you just simply runget_repos(with_code = 'your_code')
. (#333) - New logical parameter
verbose
have been introduced for limiting messages to user when pulling data - this parameter can be set in allget_*()
functions. You can also turn the verbose mode on/off globally withverbose_on()
/verbose_off()
functions.
Deprecate:
-
get_repos_stats()
function was deprecated as its role was unclear - unlikeget_commit_stats()
it did not aggregate repositories data into new stats table, but added only some new numeric columns, like number of contributors (contributors_n
) or last activity indifftime
format, which is now done withinget_repos()
function. - Pulling by
team
and filtering bylanguage
is no longer supported - these features where quite heavy for the package performance and did not bring much added value. If user needs, he can always filter the output (formatted responses pulled from API) by contributors or language. (#384) - Plot functions are no longer feature of
GitStats
, they have been deprecated as the package is meant to be basically for back end purposes and this is the field where developer’s effort should now go (#381). If needed and requested, plot functions may be brought up once more in next releases.
New features:
- Added
get_release_logs()
(#356). -
get_orgs()
is renamed toshow_orgs()
to reflect that it does not pull data from API, but only shows what is inGitStats
object. - Commits response consists now of two new columns:
author_login
andauthor_name
(#332). This is due to the mix of GitHub/GitLab handles and display names in theauthor
column (the original authorname
field in commits API response). - Improve printing
GitStats
object - now when you returnGitStats
object in console, it printsGitStats
data divided into sections to give more readable information to user:scanning scope
(organizations and repositories), andstorage
(the output tables stored inGitStats
with basic information on dimensions) (#329).
Bug fixes:
- Pagination was introduced to
contributors
response (#331). - Fixed handler of dates parameters when pulling commits. Wrong and complex construction of
gts_to_posixt()
helper which took dependencies onstringr
was a cause for some users of passing empty value tosince
parameter to commits endpoint which ended in Bad Request Error (400) and infinite loop of retrying the response (#360).
GitStats 1.1.0
New features:
-
pull_R_package_usage()
withget_R_package_usage()
functions to pull repositories where package name is found in DESCRIPTION or NAMESPACE files or code blobs with phrases related to using an R package (library(package)
,require(package)
) (#326, #341), -
pull_files()
withget_files()
to pull content of text files (#200). - possibility to pass specific repositories to
GitStats
withset_host()
function by usingrepos
parameter instead oforgs
(#330).
Minor changes and features:
- rename column names for repository output -
id
torepo_id
andname
torepo_name
, - added a
default_branch
column to repositories output as a consequence of #200.
GitStats 1.0.0
Major changes:
New features:
- added setting tokens by default - if the user does have all the PATs set up in environment variables (as e.g.
GITHUB_PAT
orGITLAB_PAT
), there is no need to pass them as an argument toset_host()
(#120), - added
pull_users()
function to pull information on users (#199), - added possibility of scanning whole internal git platforms if no
orgs
are passed (#258), - added
get_orgs()
function to print all organizations (#283), - added resetting all settings to default with
reset()
function (#270) - added resetting language in your search preferences with
reset_language()
or settinglanguage
parameter toAll
insetup()
function (#231)
Improving performance with REST and GraphQL APIs:
- added switching to REST engine in case GraphQL fails with 502 error (#225)
- added GraphQL engine for getting GitLab repositories by organization (#218)
- removed
contributors
as basic stat when pullingrepos
byorg
and byphrase
to improve speed of pulling repositories data. Addedpull_repos_contributors()
user function andadd_contributors
parameter topull_repos()
function to add conditionally information on contributors to repositories table (#235)
GitStats 0.1.0
This is the first release of GitStats with given features:
-
create_gitstats()
- creating GitStats object, -
set_connection()
- adding hosts to GitStats object, -
setup()
- setting search parameter to org, team or phrase, setting programming language of repositories, -
get_repos()
- pulling repositories from GitHub and GitLab API in a standardized table, -
get_commits()
- pulling commits from GitHub and GitLab API in a standardized table, -
set_team_member()
- adding team members to GitStats object.