December has been a busy month, mainly with paperwork. Now, almost everything has been set up. Some paperwork remains and I am getting closer and closer to having internet at home. In fact, Internet should be working... I hope there is no other problem to solve on this side.
We managed to download the ABIDE dataset. Elvis started to make a pipeline to preprocess it using SPM. I followed his work and tried to figure out why some problems were happening (registration problem mainly). I also looked at some random subject scans, about 3 per acquisition center, to determine which data may present some problems. I have done a quick summary on my wiki (ABIDE dataset summary). We can see that some scans are cut. Some seems to present really weird contrast problems: some brain areas are very bright in all scan frames. This seems not to be always on the same side but we must keep in mind that some of these subjects may be unreliable. NISL
I took some time to clear unfinished business on Nisl. This is not wasted time as developping Nisl is not orthogonal with my thesis. I made a patch to use Butterworth filter instead of FFT for rescaling. I have also fixed some bugs. Kamitani
One of the main example is Kamitani. Unfortunately, we had to remove it as the dataset has been put in another format. To get it back in Nisl, I had to convert this new dataset format into something easily readable by Nisl (NIFTI format). Now the example can be reintegrated. Unfortunately, we still have no host for our converted data so it can't be made public for the moment. This is really good news because Kamitani is a really nice exemple visually. I have also add the ability to generate an animated gif to promote some results.
Kamitani reconstitution using OMP
I have revised the ABIDE dataset fetcher. It is now very easy to fetch and filter subjects from this dataset. Filtering is based on the phenotypic file given with the dataset. Any criteria in this file can be used. Multi Subject Dictionary Learning
I went back over the code to simplify it, make it a little faster and factorize it thanks to some elements integrated in Nisl. This gave me the occasion to better understand the code.
One of our thread to speed up the computation was to do a warm restart for the computation of Vs. Gael had in mind some papers but, unfortunately, none of these was talking about warm restarting an SVD. One of them, however, talks about computing ridge regression with different values for regularization parameter. The author warm restarts its calculus using results from a ridge regression using another value for the regularisation parameter. This can't be applied on our problem.
Gael then told me to look at ARPACK source to see if some warm restart can be made. I have taken a look to the references papers of ARPACK (Lehoucq96) but the Wikipedia article is defintely much easier to understand how the algorithm works. As expected, the algorithm can be restart and, if we have a succession of closely related problems, the result of the previous calculation may be a good base, as explained in the arpack documentation. I have tried this method using random matrices: I compute the eigenvalues, modify slightly the matrix and compute it again. The computation is slightly faster (10%).
The idea now is to gather realistic data to test optimization in a more operational environment. I will try several solvers from scipy.sparse.linalg (cg, eigsh, lobpcg...) on this data and bench them using restarting. I will then integrate them in the code to bench the MSDL (using a hard coded switch).Back to blog