{"id":4777,"date":"2016-09-13T09:43:26","date_gmt":"2016-09-13T08:43:26","guid":{"rendered":"https:\/\/stevepedwards.today\/DebianAdmin\/?p=4777"},"modified":"2023-10-28T23:10:43","modified_gmt":"2023-10-28T22:10:43","slug":"awk-as-a-limited-spreadsheet-simulation-for-stats","status":"publish","type":"post","link":"https:\/\/stevepedwards.today\/DebianAdmin\/awk-as-a-limited-spreadsheet-simulation-for-stats\/","title":{"rendered":"AWK as a Limited Spreadsheet Simulation for Stats"},"content":{"rendered":"<div class=\"pvc_clear\"><\/div>\n<p id=\"pvc_stats_4777\" class=\"pvc_stats all  \" data-element-id=\"4777\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img loading=\"lazy\" decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p>\n<div class=\"pvc_clear\"><\/div>\n<p>AWK can be used in a non GUI environment as a pretty impressive, if limited Spreadsheet simulation to generate the correct data at each stage of a basic stats process that you may do in a tech maths class for example. It is limited by it's in built maths functions, which are still considerable:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.math.utah.edu\/docs\/info\/gawk_13.html#SEC124\">Calling Built-in<\/a>: How to call built-in functions.<\/li>\n<li><a href=\"https:\/\/www.math.utah.edu\/docs\/info\/gawk_13.html#SEC125\">Numeric Functions<\/a>: Functions that work with numbers, including <code>int<\/code>, <code>sin<\/code> and <code>rand<\/code>.<\/li>\n<li><a href=\"https:\/\/www.math.utah.edu\/docs\/info\/gawk_13.html#SEC126\">String Functions<\/a>: Functions for string manipulation, such as <code>split<\/code>, <code>match<\/code>, and <code>sprintf<\/code>.<\/li>\n<li><a href=\"https:\/\/www.math.utah.edu\/docs\/info\/gawk_13.html#SEC127\">I\/O Functions<\/a>: Functions for files and shell commands.<\/li>\n<li><a href=\"https:\/\/www.math.utah.edu\/docs\/info\/gawk_13.html#SEC128\">Time Functions<\/a>: Functions for dealing with time stamps.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/www.math.utah.edu\/docs\/info\/gawk_13.html#SEC125\"><span style=\"color: #00ff00;\">https:\/\/www.math.utah.edu\/docs\/info\/gawk_13.html#SEC125<\/span><\/a><\/p>\n<p>In statistics, the first stages usually involve finding the mean or average value of a data set, which is the sum of the samples divided by the total.<\/p>\n<p>Using a simple data set of 20 \"travel times\": <span style=\"color: #0000ff;\">cat traveltimes.txt<\/span><\/p>\n<p><span style=\"color: #ff0000;\">26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34<\/span><\/p>\n<p>from <span style=\"color: #00ff00;\"><a style=\"color: #00ff00;\" href=\"https:\/\/www.mathsisfun.com\/data\/standard-normal-distribution.html\">https:\/\/www.mathsisfun.com\/data\/standard-normal-distribution.html<\/a><\/span><\/p>\n<p>and checking the results here:<\/p>\n<p><span style=\"color: #00ff00;\"><a style=\"color: #00ff00;\" href=\"https:\/\/www.mathsisfun.com\/data\/standard-deviation-calculator.html\">https:\/\/www.mathsisfun.com\/data\/standard-deviation-calculator.html<\/a><\/span><\/p>\n<p>First, set up the BEGIN, pre loop section with the correct delimiter to separate the CSV fields into records and show the sample list above in the file traveltimes.txt as a column:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1}' traveltimes.txt<\/span><br \/>\n<span style=\"color: #ff0000;\">26<\/span><br \/>\n<span style=\"color: #ff0000;\">33<\/span><br \/>\n<span style=\"color: #ff0000;\">65<\/span><br \/>\n<span style=\"color: #ff0000;\">28<\/span><br \/>\n<span style=\"color: #ff0000;\">34<\/span><br \/>\n<span style=\"color: #ff0000;\">55<\/span><br \/>\n<span style=\"color: #ff0000;\">25<\/span><br \/>\n<span style=\"color: #ff0000;\">44<\/span><br \/>\n<span style=\"color: #ff0000;\">50<\/span><br \/>\n<span style=\"color: #ff0000;\">36<\/span><br \/>\n<span style=\"color: #ff0000;\">26<\/span><br \/>\n<span style=\"color: #ff0000;\">37<\/span><br \/>\n<span style=\"color: #ff0000;\">43<\/span><br \/>\n<span style=\"color: #ff0000;\">62<\/span><br \/>\n<span style=\"color: #ff0000;\">35<\/span><br \/>\n<span style=\"color: #ff0000;\">38<\/span><br \/>\n<span style=\"color: #ff0000;\">45<\/span><br \/>\n<span style=\"color: #ff0000;\">32<\/span><br \/>\n<span style=\"color: #ff0000;\">28<\/span><br \/>\n<span style=\"color: #ff0000;\">34<\/span><\/p>\n<p>Now a sum total can be generated of the sample values in column $1 and as the NR (No. of Records) is counted by default by awk, it can be used to divide the total and provide an average and a count:<\/p>\n<p><span style=\"color: #0000ff;\">\u00a0awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1} {sum +=$1; avg=(sum\/NR)} END{print <strong>sum, avg, NR<\/strong>}' traveltimes.txt<\/span><br \/>\n<span style=\"color: #ff0000;\">26<\/span><br \/>\n<span style=\"color: #ff0000;\">33<\/span><br \/>\n<span style=\"color: #ff0000;\">65<\/span><br \/>\n<span style=\"color: #ff0000;\">28<\/span><br \/>\n<span style=\"color: #ff0000;\">34<\/span><br \/>\n<span style=\"color: #ff0000;\">55<\/span><br \/>\n<span style=\"color: #ff0000;\">25<\/span><br \/>\n<span style=\"color: #ff0000;\">44<\/span><br \/>\n<span style=\"color: #ff0000;\">50<\/span><br \/>\n<span style=\"color: #ff0000;\">36<\/span><br \/>\n<span style=\"color: #ff0000;\">26<\/span><br \/>\n<span style=\"color: #ff0000;\">37<\/span><br \/>\n<span style=\"color: #ff0000;\">43<\/span><br \/>\n<span style=\"color: #ff0000;\">62<\/span><br \/>\n<span style=\"color: #ff0000;\">35<\/span><br \/>\n<span style=\"color: #ff0000;\">38<\/span><br \/>\n<span style=\"color: #ff0000;\">45<\/span><br \/>\n<span style=\"color: #ff0000;\">32<\/span><br \/>\n<span style=\"color: #ff0000;\">28<\/span><br \/>\n<span style=\"color: #ff0000;\">34<\/span><br \/>\n<strong><span style=\"color: #ff0000;\">776 38.8 20<\/span><\/strong><\/p>\n<p>These correlate with the webpage calculator above:<\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/SDCalc.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4788\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/SDCalc.png\" alt=\"SDCalc.png\" width=\"816\" height=\"408\" \/><\/a><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sqrres.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4789\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sqrres.png\" alt=\"sqrres.png\" width=\"729\" height=\"218\" \/><\/a><\/p>\n<p>The mean (avg) is 38.8, so\u00a0subtract this from $1 and print it for each record, with tabs (\\t) added to tidy the columns:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\"\u00a0<strong>($1-38.8)<\/strong>} {sum +=$1; avg=(sum\/NR) } END{print sum, avg, NR}' traveltimes.txt<\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/minusavg.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4790\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/minusavg.png\" alt=\"minusavg.png\" width=\"800\" height=\"552\" \/><\/a><\/p>\n<p>Now calculate the Variance:<\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/variance.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4791\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/variance.png\" alt=\"variance.png\" width=\"774\" height=\"141\" \/><\/a><\/p>\n<p>This means first square all those column\u00a02 differences, with tabs for clarity while at it:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\" ($1-38.8) \"\\t\" (($1-38.8)*($1-38.8))} {sum +=$1; avg=(sum\/NR) } END{print sum, avg, NR}' traveltimes.txt<\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/varianceterm.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4793\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/varianceterm.png\" alt=\"varianceterm.png\" width=\"705\" height=\"403\" \/><\/a><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sqrres-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4792\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sqrres-1.png\" alt=\"sqrres.png\" width=\"779\" height=\"233\" \/><\/a><\/p>\n<p>These columns can now have variables assigned to their calculations in the main loop body, and summed in the END section also using the variable names as print references. The seemingly nonsense exponential is small scale rounding for zero!:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8))} {sum +=$1; avg=(sum\/NR); sum2+=($1-38.8); sum3+=(($1-38.8)*($1-38.8)) } END{print sum, avg, NR, \"\\t\" sum2, \"\\t\" sum3}' traveltimes.txt<\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/3col.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4794\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/3col.png\" alt=\"3col.png\" width=\"728\" height=\"473\" \/><\/a><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/variance-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4795\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/variance-1.png\" alt=\"variance.png\" width=\"752\" height=\"137\" \/><\/a><\/p>\n<p>The Variance can be calculated from the differences squared sum (2599.2) by dividing it by the number of samples (20), which needs to be evaluated again as NR for use as the divisor (20) in the END section:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8))} {sum +=$1; avg=(sum\/NR); sum2+=($1-38.8); sum3+=(($1-38.8)*($1-38.8)) } END{print sum, avg, <strong>NR<\/strong>, \"\\t\" sum2, \"\\t\" sum3, <strong>sum3\/20<\/strong>}' traveltimes.txt<\/span><\/p>\n<p><span style=\"color: #ff0000;\">776 38.8 20 5.68434e-14 2599.2 <strong>129.96<\/strong><\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/variance-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4795\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/variance-1.png\" alt=\"variance.png\" width=\"802\" height=\"146\" \/><\/a><\/p>\n<p>The Std Dev is found by adding yet another calculation to the END print arguments:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8))} {sum +=$1; avg=(sum\/NR); sum2+=($1-38.8); sum3+=(($1-38.8)*($1-38.8)) } END{print sum, avg, NR, \"\\t\" sum2, \"\\t\" sum3, sum3\/20, <strong>sqrt(sum3\/20)<\/strong>}' traveltimes.txt<\/span><\/p>\n<p><span style=\"color: #ff0000;\">776 38.8 20 5.68434e-14 2599.2 129.96 <strong>11.4<\/strong><\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sqVariance.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4797\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sqVariance.png\" alt=\"sqVariance.png\" width=\"818\" height=\"105\" \/><\/a><\/p>\n<p>For all final sorted columns as per Libre comparison below - the decimal exponent should be rounded to 1 significant figure in awk to get 0 also but needs printf %d which complicates it; as I had to do in Libre - but the small exponent value was the same as Awk showed:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8)), \"\\t\" (($1-38.8)*($1-38.8))\/20} {sum +=$1; avg=(sum\/NR); sum2+=($1-38.8); sum3+=(($1-38.8)*($1-38.8)) } END{print sum, avg, NR, \"\\t\" sum2, \"\\t\" sum3, \"\\t\" sum3\/20, \"\\t\" sqrt(sum3\/20)}' traveltimes.txt | sort -n<\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sortedall.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4798\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sortedall.png\" alt=\"sortedall.png\" width=\"698\" height=\"471\" \/><\/a><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sortedlibre.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4799\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sortedlibre.png\" alt=\"sortedlibre.png\" width=\"616\" height=\"483\" \/><\/a><\/p>\n<p>Awk is pretty impressive as a text based spreadsheet imitator eh?!<\/p>\n<p>If you want to know how many StdDevs away from the Mean each value is, remove the END section and divide $2 by the StdDev (11.4) to give the \"Z\" score:<\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/Standard_Score_Calc.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4820\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/Standard_Score_Calc.gif\" alt=\"standard_score_calc.gif\" width=\"748\" height=\"195\" \/><\/a><\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8)), \"\\t\" (($1-38.8)*($1-38.8))\/20, \"\\t\" <strong>($1-38.8)\/11.4<\/strong>} {sum +=$1; avg=(sum\/NR); sum2+=($1-38.8); sum3+=(($1-38.8)*($1-38.8)) }' traveltimes.txt | sort -n<\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/stdevs.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4813\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/stdevs.png\" alt=\"stdevs.png\" width=\"658\" height=\"505\" \/><\/a><\/p>\n<p>If you need to remove the negative values, you can square then square root the values and add a space before the tab:<\/p>\n<p><span style=\"color: #0000ff;\">\u00a0awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8)), \"\\t\" (($1-38.8)*($1-38.8))\/20, \"\\t\" ($1-38.8)\/11.4 \" \\t\" sqrt( (($1-38.8)\/11.4)*(($1-38.8)\/11.4) )} {sum +=$1; avg=(sum\/NR); sum2+=($1-38.8); sum3+=(($1-38.8)*($1-38.8)) }' traveltimes.txt | sort -n<\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sqrtstdevs.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4815\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/sqrtstdevs.png\" alt=\"sqrtstdevs.png\" width=\"658\" height=\"454\" \/><\/a><\/p>\n<p>Although not precise, you can generate an approximate ASCII bell curve of the Std Normal Distribution probabilities by amending the above commands and piping it through prior Post tools that generate ASCII chars.<\/p>\n<p>As you can get the Z scores from the above equation, you can also find the proportion of each score to the Mean if the Mean is equal to 1, by dividing each sample by the Mean, squaring\/rooting again to remove negatives, then subtracting from 1 to get the proportional height of the probabilities for a rough bell curve :<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8)), \"\\t\" (($1-38.8)*($1-38.8))\/20, \"\\t\" ($1-38.8)\/11.4 \" \\t\" sqrt( (($1-38.8)\/11.4)*(($1-38.8)\/11.4)) <strong>\" \\t\" ($1-38.8)\/38.8 \" \\t\" \u00a0sqrt( (($1-38.8)\/38.8)*(($1-38.8)\/38.8) ) \" \\t\" 1-sqrt( (($1-38.8)\/38.8)*(($1-38.8)\/38.8) )<\/strong>} ' traveltimes.txt | sort -n<\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/props.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4824\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/props.png\" alt=\"props.png\" width=\"1010\" height=\"437\" \/><\/a><\/p>\n<p>You should be able to see how I got this \"bell\" view by amending multiplier values from those commands above for columns in the command below and piping it though the ASCII generator after multiplying the final column values by 100%:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8)), \"\\t\" (($1-38.8)*($1-38.8))\/20, \"\\t\" ($1-38.8)\/11.4 \" \\t\" sqrt( (($1-38.8)\/11.4)*(($1-38.8)\/11.4)) \" \\t\" ($1-38.8)\/38.8 \" \\t\" sqrt( (($1-38.8)\/38.8)*(($1-38.8)\/38.8) ) \" \\t\" 1-sqrt( (($1-38.8)\/38.8)*(($1-38.8)\/38.8) ) \" \\t\" <strong>100*(1-sqrt( (($1-38.8)\/38.8)*(($1-38.8)\/38.8)))<\/strong> }' traveltimes.txt | sort -n | <strong>awk '{print $1,$9}' | awk '{print $1,$2}' | awk '!max{max=$2;}{r=\"\";i=s=50*$2\/max;while(i--&gt;0)r=r\"#\";printf \"%15s %5d %s %s\",$2,$1,r,\"\\n\";}'<\/strong><\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/bell.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4822\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/bell.png\" alt=\"bell.png\" width=\"850\" height=\"437\" \/><\/a><\/p>\n<p>As duplicates don't show in a bell curve, they can be removed with uniq after sort:<\/p>\n<p><span style=\"color: #0000ff;\">awk 'BEGIN{RS=\", \" } {ORS=\"\\n\"; print $1, \"\\t\\t\" ($1-38.8) \"\\t\\t\" (($1-38.8)*($1-38.8)), \"\\t\" (($1-38.8)*($1-38.8))\/20, \"\\t\" ($1-38.8)\/11.4 \" \\t\" sqrt( (($1-38.8)\/11.4)*(($1-38.8)\/11.4)) \" \\t\" ($1-38.8)\/38.8 \" \\t\" sqrt( (($1-38.8)\/38.8)*(($1-38.8)\/38.8) ) \" \\t\" 1-sqrt( (($1-38.8)\/38.8)*(($1-38.8)\/38.8) ) \" \\t\" 100*(1-sqrt( (($1-38.8)\/38.8)*(($1-38.8)\/38.8))) }' traveltimes.txt | sort -n | uniq | awk '{print $1,$9}' | awk '{print $1,$2}' | awk '!max{max=$2;}{r=\"\";i=s=50*$2\/max;while(i--&gt;0)r=r\"#\";printf \"%15s %5d %s %s\",$2,$1,r,\"\\n\";}'<\/span><\/p>\n<p>This may be useful to you in linux because I could not - for the life of me - find how to generate a Normal Dist curve in Libre Calc or Gnumeric! Hence the webpage example below.<\/p>\n<p>Just bear in mind these proportions may\u00a0not be quite the same as the Z table values below, as they relate the Std Dev to the total area under a Normal Distribution, but close enough for ASCII values:<\/p>\n<p><a href=\"https:\/\/statistics.laerd.com\/statistical-guides\/standard-score.php\"><span style=\"color: #00ff00;\">https:\/\/statistics.laerd.com\/statistical-guides\/standard-score.php<\/span><\/a><\/p>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Standard_normal_table\"><span style=\"color: #00ff00;\">https:\/\/en.wikipedia.org\/wiki\/Standard_normal_table<\/span><\/a><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/ztable.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4828\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/ztable.png\" alt=\"ztable.png\" width=\"956\" height=\"376\" \/><\/a><\/p>\n<p>As you have the Mean: 38.8; and Std Dev: 11.4, you can go to an online grapher site like:<\/p>\n<p><a href=\"https:\/\/www.mathcracker.com\/normal_probability.php\"><span style=\"color: #00ff00;\">https:\/\/www.mathcracker.com\/normal_probability.php<\/span><\/a><\/p>\n<p>and plot the Normal Probability Distribution for these values:<\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/tails.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4800\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/tails.png\" alt=\"tails.png\" width=\"268\" height=\"196\" \/><\/a><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/tails.png\"><br \/>\n<\/a> <a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/calcs.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4801\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/calcs.png\" alt=\"calcs.png\" width=\"746\" height=\"551\" \/><\/a><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/calcs.png\"><br \/>\n<\/a><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/graph.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4802\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/graph.png\" alt=\"graph.png\" width=\"870\" height=\"525\" \/><\/a><\/p>\n<p><span style=\"color: #0000ff;\">cat traveltimes.txt<\/span><\/p>\n<p><span style=\"color: #ff0000;\">26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34<\/span><\/p>\n<p>However, you need to understand that these example figures do NOT follow a Normal Distribution, which requires symmetry as above, because they have asymmetrical Std Devs (11.4) from the Mean of 38.8; about 2 Std Devs up to 65 max to the right of centre from the Mean, and just over 1 Std down to 25 to the left. This Post is to show Awk's abilities, not explain Statistics, however it's interesting:<\/p>\n<p><span style=\"color: #00ff00;\">https:\/\/statistics.laerd.com\/statistical-guides\/standard-score.php<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/skewness.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4806\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/skewness.png\" alt=\"skewness.png\" width=\"754\" height=\"366\" \/><\/a><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/percents.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4808\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/percents.png\" alt=\"percents.png\" width=\"607\" height=\"423\" \/><\/a><\/p>\n<p>The above samples are positively skewed as the mean (38.8) is greater than the median (35.5), as there is more data to the right of the mean, up to 65.<\/p>\n<p>If the median (35.5) is included for a right tailed set it gives:<\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/right-tailed.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4810\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/right-tailed.png\" alt=\"right-tailed.png\" width=\"711\" height=\"732\" \/><\/a><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/stdnorm.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-4836\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/stdnorm.png\" alt=\"stdnorm.png\" width=\"735\" height=\"495\" \/><\/a><\/p>\n<p><iframe loading=\"lazy\" title=\"Normal Distribution and z Scores Explained - Introductory Statistics\" width=\"1778\" height=\"1000\" src=\"https:\/\/www.youtube.com\/embed\/mFYvUxOO2T4?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<p><iframe loading=\"lazy\" title=\"Normal Distributions, Standard Deviations, Modality, Skewness and Kurtosis: Understanding concepts\" width=\"1778\" height=\"1000\" src=\"https:\/\/www.youtube.com\/embed\/HnMGKsupF8Q?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<p>Here's another example you can check for the numbers:<\/p>\n<p><span style=\"color: #0000ff;\">cat 20.txt<\/span><br \/>\n<span style=\"color: #ff0000;\">1.00<\/span><br \/>\n<span style=\"color: #ff0000;\">2.00<\/span><br \/>\n<span style=\"color: #ff0000;\">3.00<\/span><br \/>\n<span style=\"color: #ff0000;\">4.00<\/span><br \/>\n<span style=\"color: #ff0000;\">5.00<\/span><br \/>\n<span style=\"color: #ff0000;\">6.00<\/span><br \/>\n<span style=\"color: #ff0000;\">7.00<\/span><br \/>\n<span style=\"color: #ff0000;\">8.00<\/span><br \/>\n<span style=\"color: #ff0000;\">9.00<\/span><br \/>\n<span style=\"color: #ff0000;\">10.00<\/span><br \/>\n<span style=\"color: #ff0000;\">11.00<\/span><br \/>\n<span style=\"color: #ff0000;\">12.00<\/span><br \/>\n<span style=\"color: #ff0000;\">13.00<\/span><br \/>\n<span style=\"color: #ff0000;\">14.00<\/span><br \/>\n<span style=\"color: #ff0000;\">15.00<\/span><br \/>\n<span style=\"color: #ff0000;\">16.00<\/span><br \/>\n<span style=\"color: #ff0000;\">17.00<\/span><br \/>\n<span style=\"color: #ff0000;\">18.00<\/span><br \/>\n<span style=\"color: #ff0000;\">19.00<\/span><br \/>\n<span style=\"color: #ff0000;\">20.00<\/span><br \/>\n<span style=\"color: #ff0000;\">20.00<\/span><br \/>\n<span style=\"color: #ff0000;\">19.00<\/span><br \/>\n<span style=\"color: #ff0000;\">18.00<\/span><br \/>\n<span style=\"color: #ff0000;\">17.00<\/span><br \/>\n<span style=\"color: #ff0000;\">16.00<\/span><br \/>\n<span style=\"color: #ff0000;\">15.00<\/span><br \/>\n<span style=\"color: #ff0000;\">14.00<\/span><br \/>\n<span style=\"color: #ff0000;\">13.00<\/span><br \/>\n<span style=\"color: #ff0000;\">12.00<\/span><br \/>\n<span style=\"color: #ff0000;\">11.00<\/span><br \/>\n<span style=\"color: #ff0000;\">10.00<\/span><br \/>\n<span style=\"color: #ff0000;\">9.00<\/span><br \/>\n<span style=\"color: #ff0000;\">8.00<\/span><br \/>\n<span style=\"color: #ff0000;\">7.00<\/span><br \/>\n<span style=\"color: #ff0000;\">6.00<\/span><br \/>\n<span style=\"color: #ff0000;\">5.00<\/span><br \/>\n<span style=\"color: #ff0000;\">4.00<\/span><br \/>\n<span style=\"color: #ff0000;\">3.00<\/span><br \/>\n<span style=\"color: #ff0000;\">2.00<\/span><br \/>\n<span style=\"color: #ff0000;\">1.00<\/span><\/p>\n<p><span style=\"color: #0000ff;\">awk '{print $1, \" \\t\\t\" ($1-10.5)\/36.5 \" \\t\\t\" sqrt((($1-10.5)\/36.5)*(($1-10.5)\/36.5)) \u00a0\" \\t\\t\" (1-sqrt((($1-10.5)\/36.5)*(($1-10.5)\/36.5)))*100}' 20.txt | sort -n | uniq | awk '{print $1,$4}' | awk '{print $1,$2}' | awk '!max{max=$2;}{r=\"\";i=s=40*$2\/max;while(i--&gt;0)r=r\"#\";printf \"%15s %5d %s %s\",$2,$1,r,\"\\n\";}'<\/span><\/p>\n<p><a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/40.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-4834\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/uploads\/2016\/09\/40.png\" alt=\"40.png\" width=\"690\" height=\"454\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"pvc_clear\"><\/div>\n<p id=\"pvc_stats_4777\" class=\"pvc_stats all  \" data-element-id=\"4777\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img loading=\"lazy\" decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/stevepedwards.today\/DebianAdmin\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p>\n<div class=\"pvc_clear\"><\/div>\n<p>AWK can be used in a non GUI environment as a pretty impressive, if limited Spreadsheet simulation to generate the correct data at each stage of a basic stats process that you may do in a tech maths class for example. It is limited by it's in built maths functions, which are still considerable: Calling <a href=\"https:\/\/stevepedwards.today\/DebianAdmin\/awk-as-a-limited-spreadsheet-simulation-for-stats\/\" class=\"more-link\">...<span class=\"screen-reader-text\">\u00a0 AWK as a Limited Spreadsheet Simulation for Stats<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-4777","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"a3_pvc":{"activated":true,"total_views":1,"today_views":0},"_links":{"self":[{"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/posts\/4777","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/comments?post=4777"}],"version-history":[{"count":2,"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/posts\/4777\/revisions"}],"predecessor-version":[{"id":10057,"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/posts\/4777\/revisions\/10057"}],"wp:attachment":[{"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/media?parent=4777"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/categories?post=4777"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/stevepedwards.today\/DebianAdmin\/wp-json\/wp\/v2\/tags?post=4777"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}