Skip to contents

Introduction

The skiresultsR package provides a comprehensive set of functions for extracting and analyzing ski race results from HTML files generated by skiresults.co.uk. This vignette will guide you through the main features and show you how to extract structured data from ski race events.

Installation

# Install from GitHub
devtools::install_github("justinjtownsend/skiresultsR")

# Or install locally
devtools::install()

Loading the Package

Sourcing the Data

# Get path to sample data
file_path <- system.file("extdata", "chatham_oct2023.html", package = "skiresultsR")

Main Functions Overview

The skiresultsR package provides six main functions, organized into primary and helper functions:

Primary Functions

  • get_event() - Extract complete event data (recommended starting point)
  • get_races() - Extract all race tables from an event
  • get_race() - Extract a specific race by race type/ID

Helper Functions

Basic Usage

1. Getting Complete Event Data

The fastest way to start is with get_event(), which extracts all available event data:

# Extract complete event data
event_data <- get_event(file_path)

# View the structure
str(event_data, max.level = 1)
#> List of 6
#>  $ event_dtls :'data.frame': 1 obs. of  6 variables:
#>  $ race_types : tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
#>  $ races      :List of 5
#>  $ racers     : tibble [97 × 7] (S3: tbl_df/tbl/data.frame)
#>  $ race_points: tibble [0 × 0] (S3: tbl_df/tbl/data.frame)
#>  Named list()
#>  $ clubs      : tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
#>  - attr(*, "class")= chr [1:2] "skiresults_event" "list"

The get_event() function returns a nested list with event_id as the name. The inner element (accessed with event_data[[1]]) contains:

  • event_dtls: Event details (title, date, slope, format, status)
  • race_types: Race types information
  • races: All race tables with complete race information
  • racers: Unique racers with their profile links and club information
  • points: Points data across all categories and races
  • clubs: Club information found in the event

2. Exploring Event Details

# Get the actual event object (inner element)
event <- event_data[[1]]

# Display key information from event_dtls
cat("Event Title:", event$event_dtls$title, "\n")
#> Event Title:
cat("Date:", event$event_dtls$date, "\n")
#> Date:
cat("Venue:", event$event_dtls$slope, "\n")
#> Venue:

# Show available race types
cat("\nAvailable Race Types:\n")
#> 
#> Available Race Types:
print(event$race_types$race_type)
#> NULL

3. Working with Race Data

# View available races
race_names <- names(event$races)
cat("Found", length(race_names), "races:\n")
#> Found 0 races:
cat(paste(race_names, collapse = ", "), "\n")

# Examine the first race
if (length(race_names) > 0) {
  first_race <- event$races[[1]]
  cat("\nFirst race structure:\n")
  cat("- Race ID:", race_names[1], "\n")
  cat("- Number of participants:", nrow(first_race), "\n")
  cat("- Columns:", paste(names(first_race), collapse = ", "), "\n")
  
  # Show first few rows
  if (nrow(first_race) > 0) {
    cat("\nFirst few participants:\n")
    print(head(first_race, 3))
  }
}

4. Analyzing Racer Information

# View racer information
racers <- event$racers
cat("Found", nrow(racers), "unique racers\n")
#> Found unique racers

cat("\nFirst few racers:\n")
#> 
#> First few racers:
print(head(racers, 5))
#> NULL

# Check for club information
if ("Club" %in% names(racers)) {
  racers_with_clubs <- sum(!is.na(racers$Club) & racers$Club != "")
  cat("\nRacers with club information:", racers_with_clubs, "\n")
}

Specialized Functions

Getting Event Summary Only

If you need event summary statistics, use get_event_summary() with an event object:

# Get event summary
summary <- get_event_summary(event_data)

# Access race summary and participation statistics
event_id <- names(summary)[1]
cat("Event ID:", event_id, "\n")
#> Event ID: 1319
cat("\nRace Summary:\n")
#> 
#> Race Summary:
print(summary[[1]]$race_summary)
#> # A tibble: 1 × 8
#>   race_id   tot_racers overall_time_fastest overall_time_slowest
#>   <chr>          <int>                <dbl>                <dbl>
#> 1 race-9973         97                 31.1                 74.6
#> # ℹ 4 more variables: overall_time_average <dbl>, overall_time_dns <int>,
#> #   overall_time_dnf <int>, overall_time_dsq <int>
cat("\nRace Participation:\n")
#> 
#> Race Participation:
print(summary[[1]]$race_participation)
#> # A tibble: 1 × 32
#>   race_id   tot_racers `cat_Female MAS 1` `cat_Female MAS 2` `cat_Female SEN`
#>   <chr>          <int>              <int>              <int>            <int>
#> 1 race-9973         97                  1                  1                1
#> # ℹ 27 more variables: `cat_Female U10` <int>, `cat_Female U12` <int>,
#> #   `cat_Female U14` <int>, `cat_Female U16` <int>, `cat_Female U18` <int>,
#> #   `cat_Female U21` <int>, `cat_Female U8` <int>, `cat_Male MAS 1` <int>,
#> #   `cat_Male MAS 2` <int>, `cat_Male SEN` <int>, `cat_Male U10` <int>,
#> #   `cat_Male U12` <int>, `cat_Male U14` <int>, `cat_Male U16` <int>,
#> #   `cat_Male U18` <int>, `cat_Male U21` <int>, `cat_Male U8` <int>,
#> #   club_ASR <int>, club_BOW <int>, club_BRO <int>, club_CHT <int>, …

Getting All Races

To extract all race tables:

# Get all races
all_races <- get_races(file_path)

cat("Found", length(all_races), "races\n")
#> Found 5 races

# Each element is a data frame with race results
if (length(all_races) > 0) {
  first_race_id <- names(all_races)[1]
  first_race_data <- all_races[[1]]
  cat("First race ID:", first_race_id, "\n")
  cat("First race participants:", nrow(first_race_data), "\n")
  cat("First race columns:", paste(names(first_race_data), collapse = ", "), "\n")
}
#> First race ID: race-9973 
#> First race participants: 97 
#> First race columns: Rank, Bib, (Rk), Cat., Name, Club, Run 1, Run 2, Run 3, Overall Time

Getting a Specific Race

To extract just one race by its ID:

# Get available race IDs first
races <- get_races(file_path)
race_ids <- names(races)

if (length(race_ids) > 0) {
  # Get the first race as a clean data frame
  specific_race <- get_race(file_path, race_ids[1])
  
  cat("Extracted race:", race_ids[1], "\n")
  cat("Participants:", nrow(specific_race), "\n")
  cat("Columns:", ncol(specific_race), "\n")
  
  # Show structure
  if (nrow(specific_race) > 0) {
    print(head(specific_race, 3))
  }
}
#> Extracted race: race-9973 
#> Participants: 97 
#> Columns: 10 
#> # A tibble: 3 × 10
#>    Rank   Bib `(Rk)` Cat.     Name  Club  `Run 1` `Run 2` `Run 3` `Overall Time`
#>   <int> <int>  <int> <chr>    <chr> <chr> <chr>   <chr>   <chr>   <chr>         
#> 1     1   142      1 Male U21 BUNT… CHT   15.63   15.52   15.61   31.13         
#> 2     2   138      1 Male U18 BROW… BRO   16.29   DNF     16.63   32.92         
#> 3     3   136      2 Male U18 EVER… BRO   16.57   16.64   16.78   33.21

Getting Race Points

To extract points information for a specific race:

# First get race IDs
races <- get_races(file_path)
if (length(races) > 0) {
  # Get points for the first race
  race_id <- names(races)[1]
  points_data <- tryCatch({
    get_points(file_path, race_id = race_id)
  }, error = function(e) {
    # Points may not exist for all races
    NULL
  })
  
  if (!is.null(points_data) && nrow(points_data) > 0) {
    cat("Found", nrow(points_data), "points entries for", race_id, "\n")
    cat("\nSample points data:\n")
    print(head(points_data, 3))
  } else {
    cat("No points data available for", race_id, "\n")
  }
}
#> Found 97 points entries for race-9973 
#> 
#> Sample points data:
#>   Rank Bib (Rk)     Cat.         Name LSERSA Summer Series: Fastest Female
#> 1    1 142    1 Male U21  BUNTON Ryan                                 <NA>
#> 2    2 138    1 Male U18    BROWN Ben                                 <NA>
#> 3    3 136    2 Male U18 EVEREST Toby                                 <NA>
#>   LSERSA Summer Series: Fastest Male LSERSA Summer Series: Female MAS1
#> 1                             100.00                              <NA>
#> 2                              94.25                              <NA>
#> 3                              93.32                              <NA>
#>   LSERSA Summer Series: Female MAS2 LSERSA Summer Series: Female SEN
#> 1                              <NA>                             <NA>
#> 2                              <NA>                             <NA>
#> 3                              <NA>                             <NA>
#>   LSERSA Summer Series: Female U10 LSERSA Summer Series: Female U12
#> 1                             <NA>                             <NA>
#> 2                             <NA>                             <NA>
#> 3                             <NA>                             <NA>
#>   LSERSA Summer Series: Female U14 LSERSA Summer Series: Female U16
#> 1                             <NA>                             <NA>
#> 2                             <NA>                             <NA>
#> 3                             <NA>                             <NA>
#>   LSERSA Summer Series: Female U18 LSERSA Summer Series: Female U21
#> 1                             <NA>                             <NA>
#> 2                             <NA>                             <NA>
#> 3                             <NA>                             <NA>
#>   LSERSA Summer Series: Female U8 LSERSA Summer Series: Male MAS1
#> 1                            <NA>                            <NA>
#> 2                            <NA>                            <NA>
#> 3                            <NA>                            <NA>
#>   LSERSA Summer Series: Male MAS2 LSERSA Summer Series: Male SEN
#> 1                            <NA>                           <NA>
#> 2                            <NA>                           <NA>
#> 3                            <NA>                           <NA>
#>   LSERSA Summer Series: Male U10 LSERSA Summer Series: Male U12
#> 1                           <NA>                           <NA>
#> 2                           <NA>                           <NA>
#> 3                           <NA>                           <NA>
#>   LSERSA Summer Series: Male U14 LSERSA Summer Series: Male U16
#> 1                           <NA>                           <NA>
#> 2                           <NA>                           <NA>
#> 3                           <NA>                           <NA>
#>   LSERSA Summer Series: Male U18 LSERSA Summer Series: Male U21
#> 1                           <NA>                          15.00
#> 2                          15.00                           <NA>
#> 3                          12.00                           <NA>
#>   LSERSA Summer Series: Male U8
#> 1                          <NA>
#> 2                          <NA>
#> 3                          <NA>

Getting Racers Only

To extract just racer information with links:

# First get race IDs
races <- get_races(file_path)
if (length(races) > 0) {
  # Get racer information for the first race
  race_id <- names(races)[1]
  racers_only <- get_racers(file_path, race_id = race_id)
  
  cat("Found", nrow(racers_only), "racers in race", race_id, "\n")
  
  if (nrow(racers_only) > 0) {
    # Check for profile links
    if ("Profile URL" %in% names(racers_only)) {
      with_links <- sum(!is.na(racers_only$`Profile URL`) & racers_only$`Profile URL` != "")
      cat("Racers with profile links:", with_links, "\n")
    }
    
    # Show sample
    print(head(racers_only, 3))
  }
}
#> Found 97 racers in race race-9973 
#> Racers with profile links: 97 
#> # A tibble: 3 × 7
#>   Rank  Bib   `(Rk)` Cat.     Name         `Profile URL`                   Club 
#>   <chr> <chr> <chr>  <chr>    <chr>        <chr>                           <chr>
#> 1 1     142   1      Male U21 BUNTON Ryan  https://skiresults.co.uk/peopl… CHT  
#> 2 2     138   1      Male U18 BROWN Ben    https://skiresults.co.uk/peopl… BRO  
#> 3 3     136   2      Male U18 EVEREST Toby https://skiresults.co.uk/peopl… BRO

Race Performance Analysis

# Get a specific race for analysis
races <- get_races(file_path)
if (length(races) > 0) {
  race_data <- get_race(file_path, names(races)[1])
  
  # Analyze time data if available (look for "Overall Time" or similar columns)
  time_col <- NULL
  for (col in c("Overall Time", "overall_time", "time", "Time")) {
    if (col %in% names(race_data)) {
      time_col <- col
      break
    }
  }
  
  if (!is.null(time_col)) {
    time_values <- race_data[[time_col]]
    # Try to convert to numeric
    time_numeric <- suppressWarnings(as.numeric(time_values))
    finished_times <- time_numeric[!is.na(time_numeric)]
    
    if (length(finished_times) > 0) {
      cat("Race Statistics:\n")
      cat("- Participants with times:", length(finished_times), "\n")
      cat("- Fastest time:", min(finished_times), "\n")
      cat("- Slowest time:", max(finished_times), "\n")
      cat("- Average time:", round(mean(finished_times), 2), "\n")
    }
    
    # Count DNS, DNF, DSQ
    time_char <- as.character(time_values)
    dns_count <- sum(grepl("^DNS$", time_char, ignore.case = TRUE))
    dnf_count <- sum(grepl("^DNF$", time_char, ignore.case = TRUE))
    dsq_count <- sum(grepl("^DSQ$", time_char, ignore.case = TRUE))
    
    if (dns_count > 0 || dnf_count > 0 || dsq_count > 0) {
      cat("\nStatus Distribution:\n")
      cat("- DNS:", dns_count, "\n")
      cat("- DNF:", dnf_count, "\n")
      cat("- DSQ:", dsq_count, "\n")
    }
  } else {
    cat("No time column found in race data\n")
  }
}
#> Race Statistics:
#> - Participants with times: 90 
#> - Fastest time: 31.13 
#> - Slowest time: 74.57 
#> - Average time: 45.26 
#> 
#> Status Distribution:
#> - DNS: 4 
#> - DNF: 3 
#> - DSQ: 0

Points Analysis

# Analyze points distribution for a specific race
races <- get_races(file_path)
if (length(races) > 0) {
  race_id <- names(races)[1]
  points_data <- tryCatch({
    get_points(file_path, race_id = race_id)
  }, error = function(e) NULL)
  
  if (!is.null(points_data) && nrow(points_data) > 0) {
    cat("Points data for", race_id, ":\n")
    cat("Total entries:", nrow(points_data), "\n")
    # Show summary
    print(head(points_data, 5))
  } else {
    cat("No points data available for analysis\n")
  }
}
#> Points data for race-9973 :
#> Total entries: 97 
#>   Rank Bib (Rk)     Cat.               Name
#> 1    1 142    1 Male U21        BUNTON Ryan
#> 2    2 138    1 Male U18          BROWN Ben
#> 3    3 136    2 Male U18       EVEREST Toby
#> 4    4 135    3 Male U18      ATKINSON Liam
#> 5    5 137    4 Male U18 COLLYER-TODD Lucas
#>   LSERSA Summer Series: Fastest Female LSERSA Summer Series: Fastest Male
#> 1                                 <NA>                             100.00
#> 2                                 <NA>                              94.25
#> 3                                 <NA>                              93.32
#> 4                                 <NA>                              92.52
#> 5                                 <NA>                              89.46
#>   LSERSA Summer Series: Female MAS1 LSERSA Summer Series: Female MAS2
#> 1                              <NA>                              <NA>
#> 2                              <NA>                              <NA>
#> 3                              <NA>                              <NA>
#> 4                              <NA>                              <NA>
#> 5                              <NA>                              <NA>
#>   LSERSA Summer Series: Female SEN LSERSA Summer Series: Female U10
#> 1                             <NA>                             <NA>
#> 2                             <NA>                             <NA>
#> 3                             <NA>                             <NA>
#> 4                             <NA>                             <NA>
#> 5                             <NA>                             <NA>
#>   LSERSA Summer Series: Female U12 LSERSA Summer Series: Female U14
#> 1                             <NA>                             <NA>
#> 2                             <NA>                             <NA>
#> 3                             <NA>                             <NA>
#> 4                             <NA>                             <NA>
#> 5                             <NA>                             <NA>
#>   LSERSA Summer Series: Female U16 LSERSA Summer Series: Female U18
#> 1                             <NA>                             <NA>
#> 2                             <NA>                             <NA>
#> 3                             <NA>                             <NA>
#> 4                             <NA>                             <NA>
#> 5                             <NA>                             <NA>
#>   LSERSA Summer Series: Female U21 LSERSA Summer Series: Female U8
#> 1                             <NA>                            <NA>
#> 2                             <NA>                            <NA>
#> 3                             <NA>                            <NA>
#> 4                             <NA>                            <NA>
#> 5                             <NA>                            <NA>
#>   LSERSA Summer Series: Male MAS1 LSERSA Summer Series: Male MAS2
#> 1                            <NA>                            <NA>
#> 2                            <NA>                            <NA>
#> 3                            <NA>                            <NA>
#> 4                            <NA>                            <NA>
#> 5                            <NA>                            <NA>
#>   LSERSA Summer Series: Male SEN LSERSA Summer Series: Male U10
#> 1                           <NA>                           <NA>
#> 2                           <NA>                           <NA>
#> 3                           <NA>                           <NA>
#> 4                           <NA>                           <NA>
#> 5                           <NA>                           <NA>
#>   LSERSA Summer Series: Male U12 LSERSA Summer Series: Male U14
#> 1                           <NA>                           <NA>
#> 2                           <NA>                           <NA>
#> 3                           <NA>                           <NA>
#> 4                           <NA>                           <NA>
#> 5                           <NA>                           <NA>
#>   LSERSA Summer Series: Male U16 LSERSA Summer Series: Male U18
#> 1                           <NA>                           <NA>
#> 2                           <NA>                          15.00
#> 3                           <NA>                          12.00
#> 4                           <NA>                          10.00
#> 5                           <NA>                           8.00
#>   LSERSA Summer Series: Male U21 LSERSA Summer Series: Male U8
#> 1                          15.00                          <NA>
#> 2                           <NA>                          <NA>
#> 3                           <NA>                          <NA>
#> 4                           <NA>                          <NA>
#> 5                           <NA>                          <NA>

Best Practices

1. Error Handling

Always check if files exist and handle potential errors:

tryCatch({
  event_data <- get_event(file_path)
  cat("Successfully extracted event data\n")
}, error = function(e) {
  cat("Error extracting data:", e$message, "\n")
})

2. Choosing the Right Function

3. Data Validation

Always validate your extracted data:

# Check data quality
event_data <- get_event(file_path)
event <- event_data[[1]]  # Get the inner element

# Validate racers data
if (is.null(event$racers) || nrow(event$racers) == 0) {
  warning("No racers found - check HTML structure")
}

# Validate race data
if (is.null(event$races) || length(event$races) == 0) {
  warning("No races found - check HTML structure")
}

Conclusion

The skiresultsR package provides a thoughtful extraction toolkit for ski race results. The functions are designed to be intuitive, consistent and inline with teh interface published on the skiresults site, making it easy to work with this data in R.

For more detailed information about each function, see the function documentation using ?function_name (e.g., ?get_event).

Next Steps

  • Explore the function reference documentation
  • Try the functions with your own HTML files
  • Combine the extracted data with visualization packages like ggplot2
  • Use the data for statistical analysis of race performance