Introduction
The skiresultsR package provides a comprehensive set of
functions for extracting and analyzing ski race results from HTML files
generated by skiresults.co.uk. This vignette will guide you through the
main features and show you how to extract structured data from ski race
events.
Installation
# Install from GitHub
devtools::install_github("justinjtownsend/skiresultsR")
# Or install locally
devtools::install()Sourcing the Data
# Get path to sample data
file_path <- system.file("extdata", "chatham_oct2023.html", package = "skiresultsR")Main Functions Overview
The skiresultsR package provides six main functions,
organized into primary and helper functions:
Primary Functions
-
get_event()- Extract complete event data (recommended starting point) -
get_races()- Extract all race tables from an event -
get_race()- Extract a specific race by race type/ID
Helper Functions
-
get_racers()- Extract unique racer information with profile links -
get_points()- Extract racer points across all categories
-
get_event_summary()- Extract event metadata (title, date, venue, etc.)
Basic Usage
1. Getting Complete Event Data
The fastest way to start is with get_event(), which
extracts all available event data:
# Extract complete event data
event_data <- get_event(file_path)
# View the structure
str(event_data, max.level = 1)
#> List of 6
#> $ event_dtls :'data.frame': 1 obs. of 6 variables:
#> $ race_types : tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
#> $ races :List of 5
#> $ racers : tibble [97 × 7] (S3: tbl_df/tbl/data.frame)
#> $ race_points: tibble [0 × 0] (S3: tbl_df/tbl/data.frame)
#> Named list()
#> $ clubs : tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
#> - attr(*, "class")= chr [1:2] "skiresults_event" "list"The get_event() function returns a nested list with
event_id as the name. The inner element (accessed with
event_data[[1]]) contains:
-
event_dtls: Event details (title, date, slope, format, status) -
race_types: Race types information -
races: All race tables with complete race information -
racers: Unique racers with their profile links and club information -
points: Points data across all categories and races -
clubs: Club information found in the event
2. Exploring Event Details
# Get the actual event object (inner element)
event <- event_data[[1]]
# Display key information from event_dtls
cat("Event Title:", event$event_dtls$title, "\n")
#> Event Title:
cat("Date:", event$event_dtls$date, "\n")
#> Date:
cat("Venue:", event$event_dtls$slope, "\n")
#> Venue:
# Show available race types
cat("\nAvailable Race Types:\n")
#>
#> Available Race Types:
print(event$race_types$race_type)
#> NULL3. Working with Race Data
# View available races
race_names <- names(event$races)
cat("Found", length(race_names), "races:\n")
#> Found 0 races:
cat(paste(race_names, collapse = ", "), "\n")
# Examine the first race
if (length(race_names) > 0) {
first_race <- event$races[[1]]
cat("\nFirst race structure:\n")
cat("- Race ID:", race_names[1], "\n")
cat("- Number of participants:", nrow(first_race), "\n")
cat("- Columns:", paste(names(first_race), collapse = ", "), "\n")
# Show first few rows
if (nrow(first_race) > 0) {
cat("\nFirst few participants:\n")
print(head(first_race, 3))
}
}4. Analyzing Racer Information
# View racer information
racers <- event$racers
cat("Found", nrow(racers), "unique racers\n")
#> Found unique racers
cat("\nFirst few racers:\n")
#>
#> First few racers:
print(head(racers, 5))
#> NULL
# Check for club information
if ("Club" %in% names(racers)) {
racers_with_clubs <- sum(!is.na(racers$Club) & racers$Club != "")
cat("\nRacers with club information:", racers_with_clubs, "\n")
}Specialized Functions
Getting Event Summary Only
If you need event summary statistics, use
get_event_summary() with an event object:
# Get event summary
summary <- get_event_summary(event_data)
# Access race summary and participation statistics
event_id <- names(summary)[1]
cat("Event ID:", event_id, "\n")
#> Event ID: 1319
cat("\nRace Summary:\n")
#>
#> Race Summary:
print(summary[[1]]$race_summary)
#> # A tibble: 1 × 8
#> race_id tot_racers overall_time_fastest overall_time_slowest
#> <chr> <int> <dbl> <dbl>
#> 1 race-9973 97 31.1 74.6
#> # ℹ 4 more variables: overall_time_average <dbl>, overall_time_dns <int>,
#> # overall_time_dnf <int>, overall_time_dsq <int>
cat("\nRace Participation:\n")
#>
#> Race Participation:
print(summary[[1]]$race_participation)
#> # A tibble: 1 × 32
#> race_id tot_racers `cat_Female MAS 1` `cat_Female MAS 2` `cat_Female SEN`
#> <chr> <int> <int> <int> <int>
#> 1 race-9973 97 1 1 1
#> # ℹ 27 more variables: `cat_Female U10` <int>, `cat_Female U12` <int>,
#> # `cat_Female U14` <int>, `cat_Female U16` <int>, `cat_Female U18` <int>,
#> # `cat_Female U21` <int>, `cat_Female U8` <int>, `cat_Male MAS 1` <int>,
#> # `cat_Male MAS 2` <int>, `cat_Male SEN` <int>, `cat_Male U10` <int>,
#> # `cat_Male U12` <int>, `cat_Male U14` <int>, `cat_Male U16` <int>,
#> # `cat_Male U18` <int>, `cat_Male U21` <int>, `cat_Male U8` <int>,
#> # club_ASR <int>, club_BOW <int>, club_BRO <int>, club_CHT <int>, …Getting All Races
To extract all race tables:
# Get all races
all_races <- get_races(file_path)
cat("Found", length(all_races), "races\n")
#> Found 5 races
# Each element is a data frame with race results
if (length(all_races) > 0) {
first_race_id <- names(all_races)[1]
first_race_data <- all_races[[1]]
cat("First race ID:", first_race_id, "\n")
cat("First race participants:", nrow(first_race_data), "\n")
cat("First race columns:", paste(names(first_race_data), collapse = ", "), "\n")
}
#> First race ID: race-9973
#> First race participants: 97
#> First race columns: Rank, Bib, (Rk), Cat., Name, Club, Run 1, Run 2, Run 3, Overall TimeGetting a Specific Race
To extract just one race by its ID:
# Get available race IDs first
races <- get_races(file_path)
race_ids <- names(races)
if (length(race_ids) > 0) {
# Get the first race as a clean data frame
specific_race <- get_race(file_path, race_ids[1])
cat("Extracted race:", race_ids[1], "\n")
cat("Participants:", nrow(specific_race), "\n")
cat("Columns:", ncol(specific_race), "\n")
# Show structure
if (nrow(specific_race) > 0) {
print(head(specific_race, 3))
}
}
#> Extracted race: race-9973
#> Participants: 97
#> Columns: 10
#> # A tibble: 3 × 10
#> Rank Bib `(Rk)` Cat. Name Club `Run 1` `Run 2` `Run 3` `Overall Time`
#> <int> <int> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 142 1 Male U21 BUNT… CHT 15.63 15.52 15.61 31.13
#> 2 2 138 1 Male U18 BROW… BRO 16.29 DNF 16.63 32.92
#> 3 3 136 2 Male U18 EVER… BRO 16.57 16.64 16.78 33.21Getting Race Points
To extract points information for a specific race:
# First get race IDs
races <- get_races(file_path)
if (length(races) > 0) {
# Get points for the first race
race_id <- names(races)[1]
points_data <- tryCatch({
get_points(file_path, race_id = race_id)
}, error = function(e) {
# Points may not exist for all races
NULL
})
if (!is.null(points_data) && nrow(points_data) > 0) {
cat("Found", nrow(points_data), "points entries for", race_id, "\n")
cat("\nSample points data:\n")
print(head(points_data, 3))
} else {
cat("No points data available for", race_id, "\n")
}
}
#> Found 97 points entries for race-9973
#>
#> Sample points data:
#> Rank Bib (Rk) Cat. Name LSERSA Summer Series: Fastest Female
#> 1 1 142 1 Male U21 BUNTON Ryan <NA>
#> 2 2 138 1 Male U18 BROWN Ben <NA>
#> 3 3 136 2 Male U18 EVEREST Toby <NA>
#> LSERSA Summer Series: Fastest Male LSERSA Summer Series: Female MAS1
#> 1 100.00 <NA>
#> 2 94.25 <NA>
#> 3 93.32 <NA>
#> LSERSA Summer Series: Female MAS2 LSERSA Summer Series: Female SEN
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> LSERSA Summer Series: Female U10 LSERSA Summer Series: Female U12
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> LSERSA Summer Series: Female U14 LSERSA Summer Series: Female U16
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> LSERSA Summer Series: Female U18 LSERSA Summer Series: Female U21
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> LSERSA Summer Series: Female U8 LSERSA Summer Series: Male MAS1
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> LSERSA Summer Series: Male MAS2 LSERSA Summer Series: Male SEN
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> LSERSA Summer Series: Male U10 LSERSA Summer Series: Male U12
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> LSERSA Summer Series: Male U14 LSERSA Summer Series: Male U16
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> LSERSA Summer Series: Male U18 LSERSA Summer Series: Male U21
#> 1 <NA> 15.00
#> 2 15.00 <NA>
#> 3 12.00 <NA>
#> LSERSA Summer Series: Male U8
#> 1 <NA>
#> 2 <NA>
#> 3 <NA>Getting Racers Only
To extract just racer information with links:
# First get race IDs
races <- get_races(file_path)
if (length(races) > 0) {
# Get racer information for the first race
race_id <- names(races)[1]
racers_only <- get_racers(file_path, race_id = race_id)
cat("Found", nrow(racers_only), "racers in race", race_id, "\n")
if (nrow(racers_only) > 0) {
# Check for profile links
if ("Profile URL" %in% names(racers_only)) {
with_links <- sum(!is.na(racers_only$`Profile URL`) & racers_only$`Profile URL` != "")
cat("Racers with profile links:", with_links, "\n")
}
# Show sample
print(head(racers_only, 3))
}
}
#> Found 97 racers in race race-9973
#> Racers with profile links: 97
#> # A tibble: 3 × 7
#> Rank Bib `(Rk)` Cat. Name `Profile URL` Club
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 142 1 Male U21 BUNTON Ryan https://skiresults.co.uk/peopl… CHT
#> 2 2 138 1 Male U18 BROWN Ben https://skiresults.co.uk/peopl… BRO
#> 3 3 136 2 Male U18 EVEREST Toby https://skiresults.co.uk/peopl… BRORace Performance Analysis
# Get a specific race for analysis
races <- get_races(file_path)
if (length(races) > 0) {
race_data <- get_race(file_path, names(races)[1])
# Analyze time data if available (look for "Overall Time" or similar columns)
time_col <- NULL
for (col in c("Overall Time", "overall_time", "time", "Time")) {
if (col %in% names(race_data)) {
time_col <- col
break
}
}
if (!is.null(time_col)) {
time_values <- race_data[[time_col]]
# Try to convert to numeric
time_numeric <- suppressWarnings(as.numeric(time_values))
finished_times <- time_numeric[!is.na(time_numeric)]
if (length(finished_times) > 0) {
cat("Race Statistics:\n")
cat("- Participants with times:", length(finished_times), "\n")
cat("- Fastest time:", min(finished_times), "\n")
cat("- Slowest time:", max(finished_times), "\n")
cat("- Average time:", round(mean(finished_times), 2), "\n")
}
# Count DNS, DNF, DSQ
time_char <- as.character(time_values)
dns_count <- sum(grepl("^DNS$", time_char, ignore.case = TRUE))
dnf_count <- sum(grepl("^DNF$", time_char, ignore.case = TRUE))
dsq_count <- sum(grepl("^DSQ$", time_char, ignore.case = TRUE))
if (dns_count > 0 || dnf_count > 0 || dsq_count > 0) {
cat("\nStatus Distribution:\n")
cat("- DNS:", dns_count, "\n")
cat("- DNF:", dnf_count, "\n")
cat("- DSQ:", dsq_count, "\n")
}
} else {
cat("No time column found in race data\n")
}
}
#> Race Statistics:
#> - Participants with times: 90
#> - Fastest time: 31.13
#> - Slowest time: 74.57
#> - Average time: 45.26
#>
#> Status Distribution:
#> - DNS: 4
#> - DNF: 3
#> - DSQ: 0Points Analysis
# Analyze points distribution for a specific race
races <- get_races(file_path)
if (length(races) > 0) {
race_id <- names(races)[1]
points_data <- tryCatch({
get_points(file_path, race_id = race_id)
}, error = function(e) NULL)
if (!is.null(points_data) && nrow(points_data) > 0) {
cat("Points data for", race_id, ":\n")
cat("Total entries:", nrow(points_data), "\n")
# Show summary
print(head(points_data, 5))
} else {
cat("No points data available for analysis\n")
}
}
#> Points data for race-9973 :
#> Total entries: 97
#> Rank Bib (Rk) Cat. Name
#> 1 1 142 1 Male U21 BUNTON Ryan
#> 2 2 138 1 Male U18 BROWN Ben
#> 3 3 136 2 Male U18 EVEREST Toby
#> 4 4 135 3 Male U18 ATKINSON Liam
#> 5 5 137 4 Male U18 COLLYER-TODD Lucas
#> LSERSA Summer Series: Fastest Female LSERSA Summer Series: Fastest Male
#> 1 <NA> 100.00
#> 2 <NA> 94.25
#> 3 <NA> 93.32
#> 4 <NA> 92.52
#> 5 <NA> 89.46
#> LSERSA Summer Series: Female MAS1 LSERSA Summer Series: Female MAS2
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> LSERSA Summer Series: Female SEN LSERSA Summer Series: Female U10
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> LSERSA Summer Series: Female U12 LSERSA Summer Series: Female U14
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> LSERSA Summer Series: Female U16 LSERSA Summer Series: Female U18
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> LSERSA Summer Series: Female U21 LSERSA Summer Series: Female U8
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> LSERSA Summer Series: Male MAS1 LSERSA Summer Series: Male MAS2
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> LSERSA Summer Series: Male SEN LSERSA Summer Series: Male U10
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> LSERSA Summer Series: Male U12 LSERSA Summer Series: Male U14
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> LSERSA Summer Series: Male U16 LSERSA Summer Series: Male U18
#> 1 <NA> <NA>
#> 2 <NA> 15.00
#> 3 <NA> 12.00
#> 4 <NA> 10.00
#> 5 <NA> 8.00
#> LSERSA Summer Series: Male U21 LSERSA Summer Series: Male U8
#> 1 15.00 <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>Best Practices
2. Choosing the Right Function
- Use
get_event()for comprehensive analysis - Use
get_event_summary()for quick metadata checks - Use
get_race()when you know the specific race you want - Use helper functions for targeted data extraction,
e.g.
get_race_types()
3. Data Validation
Always validate your extracted data:
# Check data quality
event_data <- get_event(file_path)
event <- event_data[[1]] # Get the inner element
# Validate racers data
if (is.null(event$racers) || nrow(event$racers) == 0) {
warning("No racers found - check HTML structure")
}
# Validate race data
if (is.null(event$races) || length(event$races) == 0) {
warning("No races found - check HTML structure")
}Conclusion
The skiresultsR package provides a thoughtful extraction
toolkit for ski race results. The functions are designed to be
intuitive, consistent and inline with teh interface published on the
skiresults site, making it easy to work with this data in R.
For more detailed information about each function, see the function
documentation using ?function_name (e.g.,
?get_event).
