Introduction to Python Resources

Begin your free Python journey now,
at your own pace

check

Challenge: Data Series

check

Solidify your Python data-manipulation skills with these data series challenges.


What are Data Series?

A "data series" is a collection of values, with an associated label for each value in the collection. In a data series, each value is a single piece of data (such as a string, number, boolean). Likewise, each label is a single data piece. An associated label and value together is called a "point".

For example, this is a data series containing the average number of daylight hours per month of the year in Washington, DC:

MonthHours of daylight
January9.82
February10.83
March11.98
April13.28
May14.37
June14.93
July14.67
August13.72
September12.48
October11.23
November10.10
December9.50

In the example above, the "Hours of daylight" are the values of the series, and the "Month"s are their associated labels. The pair of the month April and its 13.28 hours of daylight represents one point in this data series.

Data series are conceptual collections. They are commonly represented in multiple ways in Python. Different representations have different pros and cons.

In the example, the labels would be best represented by strings (str), while the values could either be strings (str) or floating-point numbers (float). Keeping the values as strings would preserve their formatting, while keeping them as floating-point numbers would make it easier to perform math with them.

Challenge Tasks

In the following sections, you'll be given different data series Python representations. Your goal is to complete all of the following tasks for each of the data series representations. While there is an expected output for each task, there are no example solutions. Each representation below includes label metadata about column names and data point ordering.

For each format, define a function to complete each of the following tasks on the pre-defined series. Each function should take the pre-defined values as inputs, then complete its task and return its result. Call each function, print its output, and verify that it satifies the corresponding task. Do not modify or reassign the predefined variables unless the task specifically says to do so:

  1. Get the last-appearing data point in the series, as a dict with the column names as keys.

    Expected Output*
    *This task's expected output is the same for all series representations.
    > {'year': 2020, 'unemployment_rate': 4}
  2. Get the first five data points, in the order they appear in the series, as a list of lists.

    Expected Output*
    *This task's expected output is the same for all series representations.
    > [[2005, 6], [2001, 5], [2013, 8], [2011, 10], [2017, 5]]
  3. Check if the years 2000 and 2010 are each included in the data series.

    Expected Output*
    *This task's expected output is the same for all series representations.

    For the year 2000:

    > False

    For the year 2010:

    > True
  4. Get the unemployment rate for the most recent year.

    Expected Output*
    *This task's expected output is the same for all series representations.
    > 7
  5. Get the list of the unemployment rates, ordered by year.

    Expected Output*
    *This task's expected output is the same for all series representations.
    > [5, 6, 6, 6, 6, 5, 5, 5, 8, 11, 10, 9, 8, 7, 6, 5, 5, 4, 4, 4, 7]
  6. Get the largest unemployment rate.

    Expected Output*
    *This task's expected output is the same for all series representations.
    > 11
  7. Create a new series, in the same format as the original, with the "employment rate" instead of the "unemployment rate" (the rates are listed in percentages).

    Example Output*
    *This task's expected output is different for each series representation.
    > [[2005, 94], [2001, 95], [2013, 92], [2011, 90], [2017, 95], [2006, 95], [2009, 92], [2021, 93], [2002, 94], [2007, 95], [2012, 91], [2014, 93], [2003, 94], [2019, 96], [2016, 95], [2015, 94], [2018, 96], [2008, 95], [2010, 89], [2004, 94], [2020, 96]]
  8. Create a new series, in the same format as the original, but only containing the data points where the unemployment rate is at least 7.

    Example Output*
    *This task's expected output is different for each series representation.
    > [[2013, 8], [2011, 10], [2009, 8], [2021, 7], [2012, 9], [2014, 7], [2010, 11]]
  9. Create a dict containing the count of the number of times each unemployment rate appears in the series.

    Expected Output*
    *This task's expected output is the same for all series representations.
    > {6: 5, 5: 6, 8: 2, 10: 1, 7: 2, 9: 1, 4: 3, 11: 1}

Series Representation: list of Label-Value lists

One of the simpler representations is a list of lists, where each element of the list is a series point represented by a list of length 2, with the label first and the value second:

# Series Representation
unemployment_rates = [
    [2005, 6],
    [2001, 5],
    [2013, 8],
    [2011, 10],
    [2017, 5],
    [2006, 5],
    [2009, 8],
    [2021, 7],
    [2002, 6],
    [2007, 5],
    [2012, 9],
    [2014, 7],
    [2003, 6],
    [2019, 4],
    [2016, 5],
    [2015, 6],
    [2018, 4],
    [2008, 5],
    [2010, 11],
    [2004, 6],
    [2020, 4]
]
# Label Metadata
columns = ["year", "unemployment_rate"]
label_order = [
    2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010,
    2011, 2012, 2013, 2014, 2015,
    2016, 2017, 2018, 2019, 2020,
    2021
]

Series Representation: list of Column-Value dicts

A more human-readable representation is a list of dicts, where each element of the list is a series point represented by a dict. Each point contains both its label and value as the dict's values, with their column names as their keys:

# Series Representation
unemployment_rates = [
    {"unemployment_rate": 8, "year": 2013},
    {"unemployment_rate": 5, "year": 2006},
    {"unemployment_rate": 5, "year": 2017},
    {"unemployment_rate": 6, "year": 2015},
    {"unemployment_rate": 6, "year": 2002},
    {"unemployment_rate": 4, "year": 2019},
    {"unemployment_rate": 9, "year": 2012},
    {"unemployment_rate": 4, "year": 2018},
    {"unemployment_rate": 6, "year": 2003},
    {"unemployment_rate": 5, "year": 2007},
    {"unemployment_rate": 11, "year": 2010},
    {"unemployment_rate": 7, "year": 2014},
    {"unemployment_rate": 6, "year": 2004},
    {"unemployment_rate": 5, "year": 2016},
    {"unemployment_rate": 6, "year": 2005},
    {"unemployment_rate": 7, "year": 2021},
    {"unemployment_rate": 5, "year": 2001},
    {"unemployment_rate": 4, "year": 2020},
    {"unemployment_rate": 10, "year": 2011},
    {"unemployment_rate": 8, "year": 2009},
    {"unemployment_rate": 5, "year": 2008}
]
# Label Metadata
columns = ["year", "unemployment_rate"]
label_order = [
    2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010,
    2011, 2012, 2013, 2014, 2015,
    2016, 2017, 2018, 2019, 2020,
    2021
]

Series Representation: dict of Values with Labels as Keys

One representation used for quickly looking up values by their labels uses a single dict, where each key-value pair is a series point, with the label being the key and the point value being the dict value:

# Series Representation
unemployment_rates = {
    2002: 6,
    2020: 4,
    2007: 5,
    2015: 6,
    2010: 11,
    2014: 7,
    2001: 5,
    2006: 5,
    2004: 6,
    2009: 8,
    2013: 8,
    2008: 5,
    2021: 7,
    2018: 4,
    2011: 10,
    2005: 6,
    2016: 5,
    2019: 4,
    2012: 9,
    2017: 5,
    2003: 6
}
# Label Metadata
columns = ["year", "unemployment_rate"]
label_order = [
    2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010,
    2011, 2012, 2013, 2014, 2015,
    2016, 2017, 2018, 2019, 2020,
    2021
]

Series Representation: dict of lists of Values with Columns as Keys

The last representation for this challenge lists the series labels and values separately. It uses a dict, where each key is a column name and each value is a list of that column's values. The order of the label list and the series value list line-up so that the same indexes paired together form a series point:

# Series Representation
unemployment_rates = {
    'year': [
        2009, 2011, 2005, 2016, 2001,
        2012, 2019, 2007, 2021, 2018,
        2010, 2013, 2006, 2017, 2015,
        2014, 2008, 2004, 2003, 2020,
        2002
    ],
    'unemployment_rate': [
        8, 10, 6, 5, 5,
        9, 4, 5, 7, 4,
        11, 8, 5, 5, 6,
        7, 5, 6, 6, 4,
        6
    ]
}
# Label Metadata
columns = ["year", "unemployment_rate"]
label_order = [
    2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010,
    2011, 2012, 2013, 2014, 2015,
    2016, 2017, 2018, 2019, 2020,
    2021
]

Challenge Completion

Once you've completed all the challenges above: Congratulations! You're ready to start working with full data sets in Python! While data series are a start, data sets commonly have multiple columns of values associated with each label, and a data series is just one of those columns stripped out with the labels.