V
Twinee
Alexandre Abraham's website

Study of the ABIDE dataset

TL;DR

NYU and CMU sites have not been reviewed (for the moment) because of technical problems. I have applied several filters on the remaining data to have a good dataset suited for prediction:

We keep subjects that have IQ tests based on Weschler scale and pass the quality check (194 subjects). Quality check summary is available here.

ABIDE dataset

ABIDE is a publicly available dataset of autism fMRI resting state data. It is a very heterogenous dataset as data comes from 17 different using different protocoles.

The purpose of this blog post is to explain how we sorted these subjects, based on several criteria, and be able to share it. Do not hesitate to contact me to have further details.

ABIDE official website

I will talk several times of Phenotypic information contained in the ABIDE dataset. The phenotypic csv file is also required to reproduce all the figures presented here. You can download it on the ABIDE official website after registration. As this file is not public, you understand that I am not able to ditribute it.

ABIDE sites

ABIDE data comes from 17 different sites. For us, obviously, the lesser is the better as site-specific noise and artefacts may come up into the data.

Site name Total Control Autist
CALTECH 38 19 19
CMU 27 13 14
KKI 55 33 22
LEUVEN 64 35 29
MAX_MUN 57 33 24
NYU 184 105 79
OHSU 28 15 13
OLIN 36 16 20
PITT 57 27 30
SBL 30 15 15
SDSU 36 22 14
STANFORD 40 20 20
TRINITY 49 25 24
UCLA 109 47 62
UM 145 77 68
USM 101 43 58
YALE 56 28 28

A quality check has been done on most of these scans. To avoid site-related effects, sites from which a little number of subjects remained were removed (this is the case of Pitt whom had only 4 good subjects). These subjects are tagged group in the final report.

Quality checking

Quality checking consists in verifying the quality of fMRI data. In fact, fMRI acquisition is noisy and some strong artefacts may persist in the data.

Several typical artefacts have been observed in the data.

Ghost

A Ghost is an echo of the fMRI image. We can see it on the image below: the image of the brain seems to be repeated in the empty area around the brain.

Caltech_51480_Ghost

Ghost example on subject Caltech 51480 (increase contrast if you cannot see it)

Scanner artefacts

The MRI scanner acquires data by slices over the brain. Some artefacts resulting of this acquisition can be sometimes seen in data and are easily recognized because of their straight lines pattern.

UCLA_51294_Artefacts

Scanner artefact example on subject UCLA 51294

Brain cut

Sometimes, the whole brain is not acquired by the scanner. This may be on purpose (you can acquire only the part of the brain you need to increase resolution on this particular area) or this can be an accident.

As autism seems to be related with area located at the bottom of the brain (cerebellum) and at the top, we cannot allow partial brain in our data. The Caltech site has been sorted out because of this problem.

Leuven_50727_head_cut

Brain cut on subject Leuven 50727

Preprocessing problems

Sometimes, brain registration goes wrong. Resulting in completely distorded images. These subjects obviously have to be removed.

Yale_50553_distorded

Registration problem on subject Yale 50553

Autism

Before going further into the dataset evaluation, we will take a closer look to the autism disorders. In fact, I believe that understanding this disease is necessary to have a more accurate look on the data.

In fact, people touched by autism are not all severely affected by it. Autism is expressed in 3 domains:

This is in fact incapaciting because it impacts everyday tasks. If you want to understand how it is like to live with such a disease, I recommend you the books of Daniel Tammet, an Asperger savant who is known for learning a foreign language in days or declaming 40.000 decimals of pi:

Adaptive Functioning

American Association of Mental Retardation recognizes three broad domains of adaptive functioning:

Adaptive behaviors are everyday living skills such as walking, talking, getting dressed, going to school, going to work, preparing a meal, cleaning the house, etc. They are skills that a person learns in the process of adapting to his/her surroundings. Since adaptive behaviors are for the most part developmental, it is possible to describe a person's adaptive behavior as an age-equivalent score. An average five-year-old, for example, would be expected to have adaptive behavior similar to that of other five-year-olds.

Autism spectrum

ABIDE classification follow the DSM-IV classification of autism. There are 3 degrees of severity:

This classification is subject to controversy as it is very difficult to segment the whole autism spectrum. In DSM-V, these distinctions will disappear under the general name of "autism". The severity of the symptoms in each domain will then be determined by several tests.

Phenotypic data

Several scales exist to measure the severity of the symptoms. Unfortunately, each scale comes with its test and protocol.

In order to run classification or regression algorithms on our data, we want to find a measure that satisfies some constraints:

Autistic Quotient is, for example, a well known score. But unfortunately, it does not distinguish verbal and behavioral problems. Plus, it is not available for all subjects across ABIDE.

AQ test is avalaible online on Wired for example. I had a score of 32, which just above the limit to be considered autist. However, I think that the help of a psychiatrist is needed to interpret the results.

Repartition of tests across ABIDE

Here is a relatively big infography to show the test available by subjects in the dataset. It is fairly simple to interpret: blank box means no score available. Colored boxes show the score of the test (red being the maximum and yellow the minimum).

As the image is huge, it is not included here. You can see it by clicking here. You can also grab the script to generate it here.

One can see that there is only a few tests available across the entire dataset: FIQ, VIQ and PIQ. The other tests are only available for some datasets and there is not enough data to help us find a correlation between them.

Intelligent Quotients (IQ)

Several IQ measures are available in ABIDE phenotypic data:

The Full Scale IQ is an aggregation the Verbal IQ (VIQ) and the Performance IQ (PIQ) while taking other phenotypic parameters into account such as the age of the subject. Unfortunately, a quick glance to FIQ scores reveals that it is almost always equal to the mean between VIQ and PIQ (it is always equal for the UM site for example). According to the psychologists we have met in Pasteur institute, this should not be the case and therefore tells us that FIQ does not bring much more information than VIQ and PIQ.

If we take a closer look to the phenotypic data, we can see that several types of test have been used to measure these quotients. Worse, event if these tests are usually made to end up with data centered on 100 with a standard deviation of 15, all tests do not have the same boundaries.

Comparison of IQ tests

The test to compute all these figure is available here.

IQs

Comparison of IQ tests scores. Suffix 'C' means control and suffix 'A' means autistic

We cannot tell much from this figure. Only that the VIQ may be a better score to classify autistic and control... But nothing certain.

A first intuition about test type would be to keep only the tests related to the Weschler Intelligence scale. In fact, this is the majority of the tests and we can except similar scores from them.

The Flynn effect is an observed increase of mean IQ over population of about 3 points every 1O years. To compensate for this effect, tests are revised regularly so that they give scores centered on 100 and with a standard deviation of 15. This is why you will see WISC_III and WISC_IV for example.

VIQ_test_type

VIQ test scores by test typei for control subjects

PIQ_test_type

PIQ test scores by test type for control subjects

The first decision taht we can take from this analysis is to eliminate the unlabeled tests. We also sort out WISC_III and WISC_IV_FULL because they do not cover enough subjects.

PPVT and Ravens seems to have very dissimilar measures depending on the site it comes from. On the other hand, WAIS, WASI and WISC, whom are based on Weschler intelligence scale, seems to agree on the mean. I believe that this is the less risky choice among measures.

A plot of IQ scores over sites is available

here

. It is not included here because very hard to interpret. But, this is a plot to keep in mind as, in some sites like UM, autistic patients seems to have better scores than control.

Conclusion

ABIDE dataset seems to provide a lot of subjects and measures but, when we take a closer look to the data, we realize that we have to throw out at least a half. This is conditioned by the fact that we have high standards as outliers can have a noticeable impact on the results of our algorithm.

To summarize our selection, we will keep subjects that have IQ tests based on Weschler scale and pass the quality check. Quality check summary is available here.

Back to blog