The hackathon

Astrohack is over!

Thanks to all participants, volunteers and partners for helping organising and supporting this event! Make sure to go to the winners page to check out there winning solution.

astrohack

 

The hackathon goal

TL;DR

In this hackathon we will try to detect the stellar masses, the total amount of mass in stars, of set of consisting 10000 to 100000 galaxies. It’s really open for everyone, the problem is straightforward enough that you don’t need ANY knowledge of astronomy.
It will take place 28/04 19:00 to 30/04 12:00 in Ghent (most likely in or around Plateau).
The winning team, chosen by a jury, will be invited to write a scientific A1 paper together with the organisers.

The problem

Due to the increasing accuracy of telescopes, space-based observations and increasing digitisation, astronomers nowadays have access to vast amounts of data of millions of galaxies. A question that they often need to solve is how to accurately determine the total stellar mass, the sum of the masses of all the individual stars in a galaxy. This is not an easy task because, unlike most other scientists, astronomers do not have a controlled environment where they can look at their subject from any angle. Instead, astronomers focus mainly on extracting as much information as possible coming from the light of these distant galaxies.

The classic method to determine the stellar mass in galaxies is by comparing total amount of light corresponding to two different wavelengths for example blue and red lightHowever, it is often the case that galaxies are only observed in one band without additional information from other wavelengths and we are faced with the following question: “Is it possible to get an estimation of the stellar mass directly from a black and white image?’. Luckily there are some known relations that hint to the fact that this could be feasible. For a while now, it is known that there is a close relation between the morphology of galaxies and the overall color. In general, disk galaxies with a clear spiral structure tend to contain more young, blue stars compared to elliptical galaxies which are made up from of older, red stars.

Picture1
Hubble sequence for galaxy classification. Notice the blue spiral galaxies compared to the red elliptical galaxies

The data

The data set will roughly consist out of 15000 to 100000 galaxies observed at two wavelengths, g and i band, and comes directly from the sloan digital sky survey.It will be converted to a set of ascii tables so it can be easily imported and manipulated by any analytics tool. In addition to the images we will provide a table with all the distances to the galaxies and stellar mass as determined using a very accurate method. In the first part of the hackathon the participants can use both wavelengths to create an accurate model for the stellar mass. In the second, more difficult part, only the g-band image can be used.

data
The set consists out of 10.000 to 100.000 galaxies

The data will be preprocessed with all the standard astronomical corrections so the participant does not need to worry about this. The images will also be cut out with around the galaxies so they are not too large and the target is clearly in the center of the image. To make sure everyone is able to work with the data in their favourite tool, the data is saved as just flat CSV files. Basically, you can consider the images to be matrices with a certain dimension (corresponding to the image resolution) and the numbers in the matrix as the brightness (or flux) of the galaxy.

Screen Shot 2017-04-03 at 22.01.17
The data will be converted to flat CSV files.

Tensorflow-proof

This research question is the main driver to organise this hackathon as it is a problem that is very much suited to be solved with current days machine learning algorithms and software. Essentially the problem can be easily described as we only need to predict one variable for each image. With the help of the astronomy department of Ghent University and the people from Datatonic, the problem and hackathon were constructed to be the ideal mix of a relevant research topic and an ideal use case for Tensorflow. This will be a nice opportunity for the people who have attended the Data Science Ghent Tensorflow trainings, given by the people from Datatonic, to test their skills. However, it should be very clear that we invite other participants as well and everyone is free to choose their tools to tackle this problem.

Computing power

As the problem itself is quite computationally intensive, the Flemish Supercomputer Center (VSC) was kind enough to provide support. During the entire hackathon, all contestants will have access to at least 13 dedicated GPU nodes and between 20 to 40 dedicated CPU nodes (roughly consisting of around 16 CPU’s each). In addition, one or two people will provide technical support for the duration of the hackathon to help contestants in submitting their scripts, optimising the runtime of their code, etc.
More information on the tools that can be used on the VSC computers will appear soon.

The reward

The winning team will be chosen by a jury who will keep in mind the predictive power, the performance, visuals, etc. More information will follow later on when and how the winners will be announced. The winning team will be invited to write an A1 scientific research paper together with the organizers on their winning approach.

Scientific references:
Dieleman et al 2015, ArXiv:1503.07077

Zibetti et al. 2009, ArXiv:0904.4252
Salim et al. 2016, ArXiv:1610.00712
Kelvin et al. 2014, ArXiv:1407.7555
Reines et al. 2015, ArXiv:1508.06274
Hughes et al. 2013, ArXiv:1207.4191
Renzini et al. 2015, ArXiv:1502.01027