Create A Statistics Distribution

Filed in: Java

Create a Distribution given a list of data

Following on from the Monte Carlo tutorial and the Low Latency LongAdder tutorial I wrote a simple class that allows you to to take the results of a monte Carlo and then cut the results into statistic bins and then create a simple Graph in a spreadsheet based on the results.
Firstly you need to understand what we mean by statistic bins. If I have a set of results ranging from 0 to 100 which are the highest and lowest numbers then I can cut this into 10 bins this means that 0 to 10 would be the first bin – then 10-20 would be the second bin. This means that you can then count how many times a number fell in that bin and maintain the count. So if we had lots of results between 0 and 10 then the first bin will keep updating every time it sees something in that range.
To do this we simply create an Interval value in the class Histogram and then use the Statistics class that has the LongAdder counter to update every time we see something in that range.
We also then work out the start and end of the intervals using simple maths. A lot of the work is easier to explain by reading the code



import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
 
public class Histogram {
      private final double interval;
     
      private Statistics stats = new Statistics();
     
      public Histogram(List  list, int bins) {
            super(); //implied but lets be explicit
           
            //We could be defensive and not trust teh user to update teh List once they send to us
            CopyOnWriteArrayList listCopy = new CopyOnWriteArrayList<>(list);
            //Clean out any Nulls
            for(Double e: listCopy){
                  if( e==null){
                        listCopy.remove(e);
                  }
            }
           
            //double min = list.stream().min(Comparator.comparing(Number::doubleValue)).get();
            double max = listCopy.stream().max(Comparator.comparing(Double::doubleValue)).get();
            if(bins > max) bins=(int) max;
            //This is teh distance between two points on teh list from Min->Max so
            //so if Min=0 and Max =100 and bins=10 then we have and interval of 10 ie: 100/10
            this.interval = max/bins;
 
            storeList(listCopy);
           
      }
     
      /*
      * Given a value workout the Key or Bin to put it in
       */
      private void createStoreMapKey(Double x){
            //implict conversion to int
            int k =   (int) (x / interval) ;
            //Under the covers this actually adds Key and Counts the values of how many times key is seen.
            stats.add(k);
           
      }
      /*
      * Map the members in the List to the appropriate buckets
       */
      private void storeList(List list){
            for( Double x: list){
                  createStoreMapKey(x);
            }
      }
     
      /*
      * We have the Keys but how does that relate to our Max and Min in the  list ie Which bin
       * Here we map the Bin numbers to the real world values in List from Constructor
       */
      private Map createRowMapping(){
            Map rowMap = new HashMap<>();
            for(Integer i: stats.freqs.keySet()){                
                  rowMap.put(i*interval, i);
            }
           
            return rowMap;
      }
     
      /*
      * The heart of the Class.
      * This takes the Row Mapping and takes the Bins from stats and maps the two together
       * So we end up with Bin counts as Value and Keys as 
       */
      public Map createFinalMap(){
            Map finalMap = new HashMap<>();
           
            Map mymap = createRowMapping();
            for(Double s :mymap.keySet()){
                  //System.out.println(String.format(" %s = %d ",s, mymap.get(s) ) );
                  Integer value = mymap.get(s);
                  finalMap.put(s, stats.freqs.get(value).longValue() );
            }
           
            return finalMap;
      }
     
      public double getInterval(){
            return interval;
      }
     
      public static void main(String[] args) {
            Histogram h ;
            Random rand = new Random();
            List list = new ArrayList<>();
           
            //Create thread pool of 100
            ExecutorService producer = Executors.newFixedThreadPool(100);
           
            for(int i=0; i<100000; i++){ 
                  producer.submit(() -> {
                        MySpecialMCRandomWalk rw = new MySpecialMCRandomWalk();
                        rw.Walk();
                        Double d = rw.getRes();
                        list.add(d);
                  } );       
            }
           
           
            //Now wait for all threads to finish
            try {
                  producer.awaitTermination(5, TimeUnit.SECONDS);
            } catch (InterruptedException e) {
                  //Interrupted so force shutdown
                  producer.shutdown();
                  //Preserve interrupt status so others are aware and can act
                  Thread.currentThread().interrupt();
            }finally{
                  //Do a Shutdown if all went well otherwise app will not end.
                  producer.shutdownNow();
            }
 
           
           
           
            //Create Histogram
            h= new Histogram(list, 100);
            //Create Final Map
            Map finalMap = h.createFinalMap();
            Set keySet = finalMap.keySet();
            List newList = keySet.stream().sorted().collect(Collectors.toList());
 
            //Print Map of Histogram bins and their data
            //NOTE from is Inclusive but to is Exclusive so when you look at variables from and to
            //Number can be >= the from number
            //Number must be < the to number         
            for(int i=0; i<newList.size(); i++){
                  double from,to,res;
                  from = newList.get(i);
                  to= newList.get(i)+h.getInterval();
                  res=finalMap.get(newList.get(i));
                  System.out.println(" ["+ from+"->"+to+") , " +res);
            }
           
      }
 
}

Now you have this data fed by an inline Monte Carlo – NOTE never do this in prod it is awful to create a class for every single thread but I did it here to keep everything in one simple class for explanation. I would recommend creating some Futures and then using CompleteableFuture for the results.
Once you have all your simulations you can then look at their Histogram details or Distribution which looks something like this:

Note this then allows us to see the Distribution of data. This is quite different from the Average which is not the highest peak on the graph it is the Average of the values across the graph that are hidden by the LongAdder count.

People who enjoyed this article also enjoyed the following:

Low Latency Java using CAS and LongAdder
Naive Bayes classification AI algorithm
K-Means Clustering AI algorithm
Equity Derivatives tutorial
Fixed Income tutorial

And the following Trails:

C++
Java
python
Scala
Investment Banking tutorials

HOME
homeicon

Tags: Java

Arif Jaffer

Arif Jaffer

Polyglot Paradise

Create A Statistics Distribution

Create a Distribution given a list of data

People who enjoyed this article also enjoyed the following:

And the following Trails: