We have been delighted to work with the Education Policy Advisory Group of the Royal Statistical Society to produce some secondary mathematics resources which use real climate data.
This resource aims to provide mathematics teachers a way of tying together the various graphical representations in GCSE Maths and Statistics, and Scottish National 5 Maths examinations. It uses real large data sets from the Met Office.
The emphasis is on teaching the statistical ideas, using a meaningful context more than trying to teach meteorology. These notes aim to support maths teachers with enough contextual information about the data to have confidence to use the resource with their classes. They also provide guidance on appropriate responses to the questions posed.
The ‘daily maximum temperatures’ are calculated as the average of the daily maximum temperatures recorded in each of the stations used. There are 540 stations across the 10 regions in the UK which feed into the data set used in these materials.
The materials use 4 regions spanning the geography of the UK – Northern Scotland, Northern Ireland, North West England & North Wales, South East and Central Southern England. We have constructed boxplots of 5 data sets all using maximum daily temperatures. There are obviously a huge number of similar potential data sets available for that weather variable, and for each of the other variables the Met Office have.
We have shown January and Winter average temperatures along with July, Summer and Annual average temperatures: we have forced a common scale to enable comparisons to be done visually without having to employ mental gymnastics in adjusting for the effects of different scales.
The resources can be used in different ways: they could be used while teaching individual topics, to supplement textbook and other resources with some real world data. However, they are particularly suited to being used together to revise statistical representations, and the relationships between the different representations.
NOTES:
- Autograph has been used to produce the Boxplots and Histograms for these resources. Autograph is free, so if you want to use different regions, it is not too difficult to import the data from the Excel data file with this resource into Autograph to generate the graphs. The Excel file has the summary statistics (5 quartiles plus mean and standard deviation) for each month, season and year at the bottom of the relevant column. The ten regions are each displayed in named sheets in the workbook.
- There are some suggested skills statements in each section. These relate to the questions already contained in the section, but you may want to expand the questions you ask in your classroom depending on time, and capabilities of the class.
- GCSE Maths (Higher Tier only) has boxplots, but does not address outliers in the graph, despite outliers themselves being required at Foundation level. They are required graphically in GCSE Statistics with boxplots. The bulk of the resource does not include outliers, but there is an optional extension where the boxplots do display outliers and ask pupils to think about the value of having the individual outlier values visible.
The student sheets below can be downloaded and edited.
Excel spreadsheet with source data
1) Autoscaling Issues
Despite technology offering multiple options to display tabular data in different formats, we spend almost no time in discussing / evaluating which displays show most clearly what you (the resource creator) want the reader (the audience) to focus on.
This is very much a classroom teaching activity as presented here.
The context can prompt discussions about why there has been such a shift from coal to wind and solar over this period – and what the future looks like.
2) What do Different Graph Types Show Best
Despite technology offering multiple options to display tabular data in different formats, we spend almost no time in discussing / evaluating which displays show most clearly what you (the resource creator) want the reader (the audience) to focus on.
All these displays show the same data (pretty much), but are not all equally easy to see a particular story in.
This is very much a classroom teaching activity as presented here.
3) Resources Using Graphs of Maximum Temperatures in the UK since 1884
There are 540 stations across the 10 regions in the UK which feed into the data set used in these materials.
The materials use 4 regions spanning the geography of the UK – Northern Scotland, Northern Ireland, North West England & North Wales, South East and Central Southern England. The ‘daily maximum temperatures’ are calculated as the average of the daily maximum temperatures recorded in each of the stations used.
We have constructed boxplots of 5 data sets all using maximum daily temperatures. There are obviously a huge number of similar potential data sets available for that weather variable, and for each of the other variables the Met Office have.
We have shown January and Winter average temperatures along with July, Summer and Annual average temperatures: we have forced a common scale to enable comparisons to be done visually without having to employ mental gymnastics in adjusting for the effects of different scales.
These three resources are designed to be taught consecutively.
Skills statements:
- Students should be able to compare 2 or more boxplots using data as evidence.
- Students should be able to make contextualised comments as to what the boxplots show. E.g higher median means a warmer temperature.
Student Worksheet 1 – Boxplots
More detailed comments on what the graphs show:
- For Jan / Winter the averages (centres) for N Ireland and Eng SE & Central S are very similar, but N Ireland has considerably less spread. Averages for Eng NW & N Wales are lower, and Scotland N is lower again. Scotland N & N Ireland have similar spreads, while Eng NW & N Wales has more variation, and Eng SE & Central S has more again.
- The story for July / Summer is very similar except that now the average in N Ireland is now similar to Eng NW and N Wales rather than Eng SE & Central S.
- In the extension question – the summer boxplots for N Ireland and Scotland N are roughly 100 to the right from the corresponding winter boxplots, but the Eng NW and N Wales and the Eng SE & Central S boxplots are about 150 to the right.
Commentary – Boxplots with Outliers
For Jan and winter data the only outliers are unusually cold years for that region, while for July and summer the only outliers are unusually hot years for that region.
While Eng SE & Central S has the highest median Jan and winter temperatures (and indeed highest max, highest UQ and highest LQ) it also has the most extreme cold Jan and winter temperatures recorded.
The only extra information with these diagrams is the detail of any outliers, so there is limited extra insight available.
Student Worksheet 2 – Histograms
Commentary:
(these descriptions are not the only way to answer the questions)
- (i) Eng SE + S Central (ii) Scotland N
- The temperatures in N Ireland have less variability; the top end of the Eng NW + N Wales data is hotter than for N Ireland, and the median will be higher for Eng NW + N Wales.
- These two regions have similar variability, but Eng SE + Central S is (about) 4 – 5 degrees hotter on average that Eng NW + N Wales.
- The big advantage of using the same scales for all 4 graphs is that you are forced to compare like with like, where if the scales are different you really have to work hard in order to compare like with like. If you wanted to only look at Scotland N then scaling it so that you lost all that unused space in the right half would be good.
- (i) No (ii) No – because histograms plot the frequency density of the data, changing the intervals does not affect the visual shape dramatically – details change only a little bit (unless it is a very unusual data set), which is the strength of the display – a bad actor can’t pick intervals to distort the impression given by the data. Since these data sets have 141 values in each, it is not surprising that there are not substantive differences by changing the intervals.
- Boxplots are 5 number summaries of a data set, so quick comparisons favour using boxplots. The histograms provide much more detail and provide the capacity for more detailed comparisons when required.The statistical issue in e) here is that where data is plentiful, smaller intervals gives more detail that is reasonably stable. Where data is scarce, it is more subject to the vagaries of randomness, and it is tempting for the user to over-interpret what the data is saying i.e. to look for an explanation for those scarce data points appearing in the particular place they did.
Students don’t often meet situations where they are asked what representation is best – but once they leave education, if they are writing any report using data (in any discipline) the software will do the donkey work – but they need to have the skills to decide which representation to use, and issues like whether to allow autoscaling (usually the default) or to force equal scales to enable like-for-like comparisons to be made easily.
Student Worksheet 3 – Time Series
Where the maths curriculum deals with time series it has a primary focus on calculations – typically of moving averages in a context where some cyclical pattern (‘seasonality’ even if cycle is weekly or daily) is present, and the behaviour can be modelled by season + trend + variation. However, time series data are critical to understanding a wide range of scientific, historical, and social science phenomena.
This section is intended as an extension to show the timelines of the data because that is a very important context in terms of weather measurements – for example, if the same maximum temperatures were recorded, but they occurred in strictly ascending order then the boxplots would be exactly the same. However, a hugely different interpretation would be appropriate. The amount of (chaotic) variation from year to year makes it much more difficult to discern long term trends. If you want more information on this, have a look at https://www.metlink.org/blog/weather-climate-and-chaos-theory/.
Again, the axes here are forced to be consistent with one another to facilitate accurate comparison visually. However, apart from ensuring the vertical axis always shows zero or a broken scale, the scale for different times of year (in the following panels) will be allowed to vary, because we are looking at the stories in 4 time series and the corresponding boxplots.
Again, the axes here are forced to be consistent with one another to facilitate accurate comparison visually. However, apart from ensuring the vertical axis always includes zero, the scale for different times of year (in the following panels) will be allowed to vary, because we are looking at the stories in 4 time series and the corresponding boxplots.
There will be quickly diminishing returns on the time taken – you could treat the following as two pairs (Jan + winter, and July + summer) where the vertical scales are the same in terms of numbers – to allow direct comparisons. The stories in each pair are very similar, so looking at one pair only, or looking very quickly at the second pair – asking ‘do we see a similar story here’ is likely to be sufficient.
Note all 5 panels have a range of 16 degrees, so variability can have visual ‘like for like’ comparisons.
Commentary:
Comparing the annual temperatures in the time series, and in the boxplots you can see in both that there is broadly similar amounts of variation in the 4 regions, but centred round different temperatures, and a feeling that it trends up a bit over time – though this is hard to be sure of because of the amount of (chaotic*) variation from year to year. If the data values had occurred in strictly ascending order over the 141 years the ‘time series’ would have never fallen as you move from 1884 to 2024 – so it would look very very different – but the boxplots would have been identical so the time line is an extremely important component of trying to understand weather and climate – and what is happening with climate change.
When you look at the January and Winter data, there is substantially more variation in the Eng SE + Central S region than in the other 3 and this corresponds to its time series showing more extreme fluctuations than the other time series.
* Chaotic variation is what meteorologists call this – mathematicians & statisticians would refer to it as ‘random variation’ but like many other phenomenon we use random to describe it only reflects that we do not understand the process well enough yet to be able to predict outcomes e.g. turbulence around a Formula 1 car.
‘Chaotic variation’ captures that aspect of the variation much better than ‘random variation’.
Gentle extensions of time series and multiple representations
One of the difficulties in dealing with time series when there is a lot of variation, as here, is to try to identify if there is any long term change underlying the process. Meteorologists describe the behaviour in temperatures as ‘chaotic variation’ – which is a very good descriptor of what it looks like. There are some things we can do to try to make it easier to identify any long term changes. One is to smooth the data by taking an average of a number of years – but how many years is best?
Using Multiple Representations
One thing that mathematicians and statisticians do to try to get fuller understanding of a problem is to look at multiple representations.
This section is very short to do with a class and its purpose is just to show how accessible the stories in data can be – without complicated statistical techniques, but using the simple graphs they know to visualise how the data behaves, and to show the power of using more than one representation to develop a fuller understanding of the stories in the data.
One of the difficulties in dealing with time series when there is a lot of variation, as here, is to try to explain every movement. There is a narrative about losing detail if you take too long a period – including not getting the next 15 year average until 2033 (the next 25 year average is actually available at the same time), where the next 3 year average is available in 2027, and next 5 year would be available in 2028. There is no ‘right answer’ to a best time period – there is a trade off between the detail of the ‘chaotic variation’ most evident in the single year data, and seeing an upward trend in the data which is more evident as the time period increases .
The table below shows the highest 20 average annual maximum temperatures in this region between 1884 and 2024. There are 19 which are above 15°C of which 12 are this century.
Other noteworthy observations from this table: All of the top five were recorded since 2014, and 8 of the top 10 were recorded since 2003.
| Year | annual |
| 2022 | 16.075 |
| 2014 | 15.63333 |
| 2020 | 15.63333 |
| 2023 | 15.625 |
| 2018 | 15.41667 |
| 1989 | 15.3 |
| 2011 | 15.3 |
| 2003 | 15.28333 |
| 1921 | 15.275 |
| 2024 | 15.2 |
| 1949 | 15.19167 |
| 2006 | 15.19167 |
| 1990 | 15.175 |
| 2019 | 15.13333 |
| 2007 | 15.075 |
| 2017 | 15.075 |
| 1995 | 15.075 |
| 1959 | 15.01667 |
| 1999 | 14.93333 |
Note of caution: this table, and the graphs are for the highest average maximum temperatures – news reports on ‘highest annual temperatures’ are normally based on the mean temperature (average of maximum and minimum daily temperatures), so the rank order doesn’t match exactly to this table.
Student Worksheet 4 – Scatter graphs
Before showing any data, or any maths questions, ask the pupils to reflect on the following question for a minute, and then discuss it for a couple of minutes with your neighbour:
- Do you think that the spring maximum temperature would be a good predictor of the autumn maximum temperature?
Commentary:
Note: It is important in looking at these activities that association (correlation) should not be confused with causation.
Q1 – Spring & Autumn data for N Ireland in 2005 – 2024 and 1885 – 1904
Neither scatter diagram shows much correlation so for these periods the spring temperature does not give you any substantive indication of what the autumn temperature will be.
The temperatures in the 2005 – 2024 period seem to be about 1 to 1.5 degrees warmer, on average, than the temperatures in 1885 – 1904 in the same season.
Q2 – Summer and Winter data for Eng SE + Central S in 2005 – 2024 and 1885 – 1904
The story is very similar to what was seen in the other two seasons, in a different region.
Q3 – Here there is fairly strong correlation between the two temperatures, which means that knowledge of the value of one would give you a reasonable prediction of what the other one is – not that there is a causal effect, but both are the result of the prevailing meteorological conditions over the UK.
However – there are systematic differences between the weather in the two regions due their geographical characteristics – and part b) draws attention to this – a line of best fit to the data (by eye) would not be too far away from parallel to the equal temperature line shown, but roughly 5 degrees lower. It wouldn’t change my view that it is a reasonably good predictor (because of the strong correlation) but it would help me to identify how I would make the prediction.