Machine learning and interpolation

I have been working on bigger problems recently. That is in the University of Sydney. The results will soon be published if all goes well but I wanted to go over shortly, the bigger ideas of what it is.

It has been shown that random forests are equivalent to a weighted nearest neighbour method as is explained well on Wikipedia. With the amazing libraries that exist for machine learning, that makes spatial data very fun to work with confidently.

What I have been working on is mapping a big data set that has very recently been compiled, to a global map. This is an imporvement on all older maps as they tend to disagree greatly, haveing been made from data that has since then been deemed bad. It is of the seafloor and the geophysicists around me tell me it is very exiting stuff. What do I know. With the mathematical formulation around it and the nice libraries you do next to no work. The first result came in yesterday and I show here below.

Kaggle

One of the things I have been doing outside of work is participating in Kaggle contests. There is an interesting one that connects two hobbies of mine, mathematics and chess. We are set to determine ELO rating from one game for a player.

We are giver a list of moves from one game and the corresponding score for each position. There were other participants that shared their experience and with my next attempts this will prove helpful.

First I wanted to see what the easiest approach would give. I standardized the data and into a certain regressor I fed the mean values of each games and the standard deviasions. This being justified by the argument that a good player of chess is a consistent player.

This landed me in place 38 on the leaderboard, out of 42. The important part being that my score was in the range of the better participants.

GSS

I have been experimenting with web development as it is all around me. HTML is straightforward and neither fun nor boring, it is mostly necessary. CSS however, as simple as it seems can be very difficult to use and I have spent many hours trying to position a div or have things line up properly.

An exciting addition was introduced a few months ago, GSS or grid stylesheets. It uses linear optimization methods to solve for positions and sizes through a pretty legible syntax. For example you can constrain a margin based on the margin of the parent element

margin-right: == ::parent[margin-right] * -1;

You can watch the founder of the company behind this library in a recently released walk-through of making a simple profile page.

In an attempt to learn to use this new library I am going to try to imitate this blog layout that I use one step at a time. The first part I have put a link to in the sidebar. I made two divs, black and white and a button to take you back to this page.

In GSS that can be written as follows

#nav-bar {
    background-color: black;
    color: white;
    height: == ::window[height];
    width: == ::window[width]/3;
    position: fixed !important;
}


#content {
    display: block;
    height: == ::window[height];
    ::[left] == #nav-bar[right];
    width: == ::window[width] - #nav-bar[width];
}

What is happening here is that I give the sidebar a width and then the content I give the rest of the space on the screen. The sidebar is fixed as will be apparent when I add content to the page. That was not easy but through the keyword !important, it gives those constraints more importance and the sidebar truly is fixed.

Benefits for scientists of learning a little CS

I got by with writing terrible code for a long time. Only when you either need to read your code much later or when you are dealing with massive amounts of calculations do you realise the value of good coding. With accessible languages like Python and Matlab or Octave you can do powerful things without knowing much more than for loops and a little googling.

To demonstrate the benefits of learning a little CS terminology I wanted to show a small example where using objects can really clean up your code. Integrating a 1D function should give a clear idea of how this would work. It is common when you start learning the to solve differential equations to use the following basis functions.

They look simple but how would you represent this in code? One idea is to code it directly for the problem you are trying to solve. The downside to that is first and foremost that it makes it hard to read the code and even worse to use it again if you need to change the problem for any reason. This is when it is good to be comfortable with objects, the core of object oriented programming. Python offers this paradigm and that is what I have used here. Since the interpolation functions are all the same, only shifted, you can call each element an object and just specify the attributes which here would be xn and xn+1.

class element:
    def __init__(self,x0,x1,LtoG):
        self.x0 = x0
        self.x1 = x1
        self.LtoG = LtoG
        self.dx = x1-x0

LtoG representing the mapping of the element to the grid. The basis is represented along with its derivative

def basis(self,x):
    if x < self.x0 or x > self.x1:
        return [0.0, 0.0]
    return [ (self.x1-x)/self.dx, (x-self.x0)/self.dx ]

def dbasis(self,x):
    if x < self.x0 or x > self.x1:
        return [0.0, 0.0]
    return [ -1.0/self.dx, 1.0/self.dx ]

You could obviously take higher derivatives if you want but generally when you learn this you will be using weak forms where the highest derivative will be the first one.

This makes the solution much more general. You could even take it as far as making the interpolation functions an attribute to the objects. I will not show that here. You can email me if you want a discussion about that. To close, this is what you will do to setup the using of this class.

E = []
for e in range(n-1):
 E.append( element(e*dx,(e+1)*dx,[e,e+1]) )

Or as a one-liner

E = [element(e*dx,(e+1)*dx,[e,e+1]) for e in range(n-1)]

This makes it such that all you need to specify is what is specific to your problem, the equation you are solving and the boundary conditions.

Euler Project

In an attempt to practice Javascript writing and hone my problem solving at the same time I set up with solving Project Euler problems. A restriction I set myself was to use no for or while loops. This can be compensated for through recursions.

As an example I will talk about problem 9.

A Pythagorean triplet is a set of three natural numbers, a < b < c, for which,

a2 + b2 = c2 For example, 32 + 42 = 9 + 16 = 25 = 52.

There exists exactly one Pythagorean triplet for which a + b + c = 1000. Find the product abc.

This is an interesting problem because although you could blindly program a solver of pythagorean triplets there are a lot of information to be read from the description. If you spend a little time with a pen and paper it can give a very nice solution for your code. My approach was to parametrise b and c with a. Firstly with the obvious c = 1000 - b - a, then you put that into Pythagoras' theorem, do algebra and get

b = (1000a-500000)/(a-1000)

Now the problem is only dependant on one parameter, I know that a solution exists and that it is unique since it's given so I can blindly start at a reasonable number, 1 or bigger and iterate.

function addP9(a) {
            var b = (1000 * a - 500000)/(a - 1000);
            var c = 1000 - b - a;
            if (a*a + b*b == c*c && b%1 == 0) {
                return a*b*c;
            }
            return addP9(a+1);
        }

a and c are automatically integers. b is not however and there exists a solution where it isn't so you need to check for that as well.

Similarly I solved many other problems for Project Euler and you can see them on my github or reach out to me via email.