An error bar is used on a graph to show the variability of data that represents some kind of error or uncertainty in a reported measurement. Usually, it represents one standard deviation of uncertainty, one standard error, or a particular confidence interval (e.g., a 90% interval). The Python module ‘Matplotlib’ has a function ‘errorbar()‘ to plot an error bar using a given data. The parameters ‘xerr’ and ‘yerr’ are used to show the spread of the data along the X-axis and Y-axis respectively. This post contains a simple python code to generate error bars. Instead of using standard deviation or percentile for ‘yerr’, I will show how to use the largest value as the upper limit and the smallest value as the lower limit for the error bar.
import matplotlib.pyplot as plt
import numpy as np
# data for plot
pdata = {1: [0.04238164151333471, 0.04854468785354314, 0.02553415061295972, 0.04136076931030917, 0.04798681907115642,
0.049232670717890614, 0.047310500874215775, 0.0418046357615894, 0.085229244114002476, 0.047044258743423086,
0.04622368964094098],
2: [0.12694300518134716, 0.1342158485015628, 0.10350425785184508, 0.12493049119555144, 0.13096437402692554,
0.13071895424836602, 0.12580943570767808, 0.18322323148912743, 0.12730627306273062, 0.13182984965163183,
0.13324780058651026],
3: [0.2022879684418146, 0.20131986960324402, 0.16179123404929858, 0.20574313663616284, 0.20381254964257348,
0.2034354468455632, 0.20456692913385827, 0.28718449661257288, 0.20503882110600538, 0.20181890075128509,
0.20381359284753542],
4: [0.24600178461116068, 0.14650205761316873, 0.2430870821295914, 0.24599659284497444, 0.24529339013329668,
0.24625804266847273, 0.24340715502555366, 0.34704999658959143, 0.2470059880239521, 0.24514894779994534,
0.24448870269899045],
5: [0.2642599277978339, 0.26820381100292695, 0.2641418983700863, 0.26618791143028125, 0.26842920090508515,
0.2653232577665827, 0.22681747827122276, 0.2650832487915498, 0.31677039076057685, 0.2660682226211849,
0.2666986737082158],
6: [0.2724514240085174, 0.2333927434754933, 0.2697639565380292, 0.2758982979617567, 0.2758987449478834,
0.27244451473456693, 0.27401040008489863, 0.27309536494405967, 0.30120986604295133, 0.2756440651177493,
0.27593162756131223],
7: [0.2273113229922679, 0.2778406946017365, 0.274887246012926, 0.27776989643921124, 0.2760084925690021,
0.2782322519981194, 0.2765856437523737, 0.2755868544600939, 0.31704324095421823, 0.27536908988974024,
0.2781901564937828],
8: [0.2140753366286007, 0.2734444965670178, 0.27441920164292133, 0.2742452064739292, 0.2751979456451958,
0.2750682128240109, 0.27406018242264085, 0.294471434040924, 0.2736806060865471, 0.2737306843267108,
0.27457296655787733],
9: [0.2383345023770556, 0.2694502968779927, 0.2700543056633049, 0.27032661570535094, 0.26904935383671436,
0.2689864890873398, 0.27072679373934605, 0.3290017783963504, 0.26925601328134047, 0.26818357115310454,
0.27049749325106054],
10: [0.2235087719298246, 0.2645703633357954, 0.26266877082237416, 0.26326739306745656, 0.26240414580342447,
0.26372852717544354, 0.26377938960720415, 0.31279613411953523, 0.2633891213389121, 0.26279527559055116,
0.2640897227904352]}
# plot error bar
x = []
y = []
yerr = []
for k, v in pdata.items():
x.append(k)
y.append(np.mean(v)) # compute mean
yerr.append([np.mean(v) - min(v), max(v) - np.mean(v)]) # use max and min as upper and lower bound
yerr = np.transpose(yerr) # yerr should be 2xN matrix
plt.errorbar(x, y, yerr=yerr, color='r')
plt.xlabel("Scale_pos_weight")
plt.ylabel('value')
plt.title("Error bar plot")
plt.grid(b=True, which='major', color='b', linestyle='-')
plt.savefig('ml_results.png')
plt.show()
For each point (x,y), I am generating ‘yerr’ using [np.mean(v) – min(v), max(v) – np.mean(v)] i.e. how far are the max and min values from the mean value.