Create A Statistics Distribution
Filed in: Java
Create a Distribution given a list of data
Following on from the Monte Carlo tutorial and the Low Latency LongAdder tutorial I wrote a simple class that allows you to to take the results of a monte Carlo and then cut the results into statistic bins and then create a simple Graph in a spreadsheet based on the results.
Firstly you need to understand what we mean by statistic bins. If I have a set of results ranging from 0 to 100 which are the highest and lowest numbers then I can cut this into 10 bins this means that 0 to 10 would be the first bin – then 10-20 would be the second bin. This means that you can then count how many times a number fell in that bin and maintain the count. So if we had lots of results between 0 and 10 then the first bin will keep updating every time it sees something in that range.
To do this we simply create an Interval value in the class Histogram and then use the Statistics class that has the LongAdder counter to update every time we see something in that range.
We also then work out the start and end of the intervals using simple maths. A lot of the work is easier to explain by reading the code
import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
public class Histogram {
private final double interval;
private Statistics stats = new Statistics();
public Histogram(Listlist, int bins) {
super(); //implied but lets be explicit
//We could be defensive and not trust teh user to update teh List once they send to us
CopyOnWriteArrayListlistCopy = new CopyOnWriteArrayList<>(list);
//Clean out any Nulls
for(Double e: listCopy){
if( e==null){
listCopy.remove(e);
}
}
//double min = list.stream().min(Comparator.comparing(Number::doubleValue)).get();
double max = listCopy.stream().max(Comparator.comparing(Double::doubleValue)).get();
if(bins > max) bins=(int) max;
//This is teh distance between two points on teh list from Min->Max so
//so if Min=0 and Max =100 and bins=10 then we have and interval of 10 ie: 100/10
this.interval = max/bins;
storeList(listCopy);
}
/*
* Given a value workout the Key or Bin to put it in
*/
private void createStoreMapKey(Double x){
//implict conversion to int
int k = (int) (x / interval) ;
//Under the covers this actually adds Key and Counts the values of how many times key is seen.
stats.add(k);
}
/*
* Map the members in the List to the appropriate buckets
*/
private void storeList(Listlist){
for( Double x: list){
createStoreMapKey(x);
}
}
/*
* We have the Keys but how does that relate to our Max and Min in the list ie Which bin
* Here we map the Bin numbers to the real world values in List from Constructor
*/
private MapcreateRowMapping(){
MaprowMap = new HashMap<>();
for(Integer i: stats.freqs.keySet()){
rowMap.put(i*interval, i);
}
return rowMap;
}
/*
* The heart of the Class.
* This takes the Row Mapping and takes the Bins from stats and maps the two together
* So we end up with Bin counts as Value and Keys as
*/
public MapcreateFinalMap(){
MapfinalMap = new HashMap<>();
Mapmymap = createRowMapping();
for(Double s :mymap.keySet()){
//System.out.println(String.format(" %s = %d ",s, mymap.get(s) ) );
Integer value = mymap.get(s);
finalMap.put(s, stats.freqs.get(value).longValue() );
}
return finalMap;
}
public double getInterval(){
return interval;
}
public static void main(String[] args) {
Histogram h ;
Random rand = new Random();
Listlist = new ArrayList<>();
//Create thread pool of 100
ExecutorService producer = Executors.newFixedThreadPool(100);
for(int i=0; i<100000; i++){
producer.submit(() -> {
MySpecialMCRandomWalk rw = new MySpecialMCRandomWalk();
rw.Walk();
Double d = rw.getRes();
list.add(d);
} );
}
//Now wait for all threads to finish
try {
producer.awaitTermination(5, TimeUnit.SECONDS);
} catch (InterruptedException e) {
//Interrupted so force shutdown
producer.shutdown();
//Preserve interrupt status so others are aware and can act
Thread.currentThread().interrupt();
}finally{
//Do a Shutdown if all went well otherwise app will not end.
producer.shutdownNow();
}
//Create Histogram
h= new Histogram(list, 100);
//Create Final Map
MapfinalMap = h.createFinalMap();
SetkeySet = finalMap.keySet();
ListnewList = keySet.stream().sorted().collect(Collectors.toList());
//Print Map of Histogram bins and their data
//NOTE from is Inclusive but to is Exclusive so when you look at variables from and to
//Number can be >= the from number
//Number must be < the to number
for(int i=0; i<newList.size(); i++){
double from,to,res;
from = newList.get(i);
to= newList.get(i)+h.getInterval();
res=finalMap.get(newList.get(i));
System.out.println(" ["+ from+"->"+to+") , " +res);
}
}
}
Now you have this data fed by an inline Monte Carlo – NOTE never do this in prod it is awful to create a class for every single thread but I did it here to keep everything in one simple class for explanation. I would recommend creating some Futures and then using CompleteableFuture for the results.
Once you have all your simulations you can then look at their Histogram details or Distribution which looks something like this:

Note this then allows us to see the Distribution of data. This is quite different from the Average which is not the highest peak on the graph it is the Average of the values across the graph that are hidden by the LongAdder count.
People who enjoyed this article also enjoyed the following:
Low Latency Java using CAS and LongAdder
Naive Bayes classification AI algorithm
K-Means Clustering AI algorithm
Equity Derivatives tutorial
Fixed Income tutorial
And the following Trails:
C++Java
python
Scala
Investment Banking tutorials
HOME
