Study of the ABIDE dataset
NYU and CMU sites have not been reviewed (for the moment) because of technical problems. I have applied several filters on the remaining data to have a good dataset suited for prediction:
- Removing NYU and CMU: 901 subjects
- Quality checking: 302 subjects
- IQ Score based on Weschler scale: 303 subjects
We keep subjects that have IQ tests based on Weschler scale and pass the quality check (194 subjects). Quality check summary is available here.
ABIDE is a publicly available dataset of autism fMRI resting state data. It is a very heterogenous dataset as data comes from 17 different using different protocoles.
The purpose of this blog post is to explain how we sorted these subjects, based on several criteria, and be able to share it. Do not hesitate to contact me to have further details.
ABIDE official website
I will talk several times of Phenotypic information contained in the ABIDE dataset. The phenotypic csv file is also required to reproduce all the figures presented here. You can download it on the ABIDE official website after registration. As this file is not public, you understand that I am not able to ditribute it.
ABIDE data comes from 17 different sites. For us, obviously, the lesser is the better as site-specific noise and artefacts may come up into the data.
A quality check has been done on most of these scans. To avoid site-related effects, sites from which a little number of subjects remained were removed (this is the case of Pitt whom had only 4 good subjects). These subjects are tagged group in the final report.
Quality checking consists in verifying the quality of fMRI data. In fact, fMRI acquisition is noisy and some strong artefacts may persist in the data.
Several typical artefacts have been observed in the data.
A Ghost is an echo of the fMRI image. We can see it on the image below: the image of the brain seems to be repeated in the empty area around the brain.
Ghost example on subject Caltech 51480 (increase contrast if you cannot see it)
The MRI scanner acquires data by slices over the brain. Some artefacts resulting of this acquisition can be sometimes seen in data and are easily recognized because of their straight lines pattern.
Scanner artefact example on subject UCLA 51294
Sometimes, the whole brain is not acquired by the scanner. This may be on purpose (you can acquire only the part of the brain you need to increase resolution on this particular area) or this can be an accident.
As autism seems to be related with area located at the bottom of the brain (cerebellum) and at the top, we cannot allow partial brain in our data. The Caltech site has been sorted out because of this problem.
Brain cut on subject Leuven 50727
Sometimes, brain registration goes wrong. Resulting in completely distorded images. These subjects obviously have to be removed.
Registration problem on subject Yale 50553
Before going further into the dataset evaluation, we will take a closer look to the autism disorders. In fact, I believe that understanding this disease is necessary to have a more accurate look on the data.
In fact, people touched by autism are not all severely affected by it. Autism is expressed in 3 domains:
- verbal, difficulty to express yourself and understand other people
- behavioral, caracterized by repetitive behaviors
- social, difficulty to exchange, play and live with other people
This is in fact incapaciting because it impacts everyday tasks. If you want to understand how it is like to live with such a disease, I recommend you the books of Daniel Tammet, an Asperger savant who is known for learning a foreign language in days or declaming 40.000 decimals of pi:
American Association of Mental Retardation recognizes three broad domains of adaptive functioning:
Adaptive behaviors are everyday living skills such as walking, talking, getting dressed, going to school, going to work, preparing a meal, cleaning the house, etc. They are skills that a person learns in the process of adapting to his/her surroundings. Since adaptive behaviors are for the most part developmental, it is possible to describe a person's adaptive behavior as an age-equivalent score. An average five-year-old, for example, would be expected to have adaptive behavior similar to that of other five-year-olds.
ABIDE classification follow the DSM-IV classification of autism. There are 3 degrees of severity:
- Asperger, a high level form of autism
- PDD-NOS (Pervasive Developmental Disorder Not Otherwise Specified) is a generic appellation for autistic troubles that do not cover the three domains pointed before.
This classification is subject to controversy as it is very difficult to segment the whole autism spectrum. In DSM-V, these distinctions will disappear under the general name of "autism". The severity of the symptoms in each domain will then be determined by several tests.
Several scales exist to measure the severity of the symptoms. Unfortunately, each scale comes with its test and protocol.
In order to run classification or regression algorithms on our data, we want to find a measure that satisfies some constraints:
- available for as many subjects as possible
- consistant across the different sites
- related to a specific symptom of the disease (as we expect to find a relation between the severity of verbal symptoms and the language brain network)
Autistic Quotient is, for example, a well known score. But unfortunately, it does not distinguish verbal and behavioral problems. Plus, it is not available for all subjects across ABIDE.
AQ test is avalaible online on Wired for example. I had a score of 32, which just above the limit to be considered autist. However, I think that the help of a psychiatrist is needed to interpret the results.
Repartition of tests across ABIDE
Here is a relatively big infography to show the test available by subjects in the dataset. It is fairly simple to interpret: blank box means no score available. Colored boxes show the score of the test (red being the maximum and yellow the minimum).
As the image is huge, it is not included here. You can see it by clicking here. You can also grab the script to generate it here.
One can see that there is only a few tests available across the entire dataset: FIQ, VIQ and PIQ. The other tests are only available for some datasets and there is not enough data to help us find a correlation between them.
Intelligent Quotients (IQ)
Several IQ measures are available in ABIDE phenotypic data:
- Full Scale IQ (FIQ)
- Verbal IQ (VIQ)
- Performance IQ (PIQ), measured on performance for everyday tasks.
The Full Scale IQ is an aggregation the Verbal IQ (VIQ) and the Performance IQ (PIQ) while taking other phenotypic parameters into account such as the age of the subject. Unfortunately, a quick glance to FIQ scores reveals that it is almost always equal to the mean between VIQ and PIQ (it is always equal for the UM site for example). According to the psychologists we have met in Pasteur institute, this should not be the case and therefore tells us that FIQ does not bring much more information than VIQ and PIQ.
If we take a closer look to the phenotypic data, we can see that several types of test have been used to measure these quotients. Worse, event if these tests are usually made to end up with data centered on 100 with a standard deviation of 15, all tests do not have the same boundaries.
Comparison of IQ tests
The test to compute all these figure is available here.
Comparison of IQ tests scores. Suffix 'C' means control and suffix 'A' means autistic
We cannot tell much from this figure. Only that the VIQ may be a better score to classify autistic and control... But nothing certain.
A first intuition about test type would be to keep only the tests related to the Weschler Intelligence scale. In fact, this is the majority of the tests and we can except similar scores from them.
The Flynn effect is an observed increase of mean IQ over population of about 3 points every 1O years. To compensate for this effect, tests are revised regularly so that they give scores centered on 100 and with a standard deviation of 15. This is why you will see WISC_III and WISC_IV for example.
VIQ test scores by test typei for control subjects
PIQ test scores by test type for control subjects
The first decision taht we can take from this analysis is to eliminate the unlabeled tests. We also sort out WISC_III and WISC_IV_FULL because they do not cover enough subjects.
PPVT and Ravens seems to have very dissimilar measures depending on the site it comes from. On the other hand, WAIS, WASI and WISC, whom are based on Weschler intelligence scale, seems to agree on the mean. I believe that this is the less risky choice among measures.
A plot of IQ scores over sites is availablehere
. It is not included here because very hard to interpret. But, this is a plot to keep in mind as, in some sites like UM, autistic patients seems to have better scores than control.
ABIDE dataset seems to provide a lot of subjects and measures but, when we take a closer look to the data, we realize that we have to throw out at least a half. This is conditioned by the fact that we have high standards as outliers can have a noticeable impact on the results of our algorithm.
To summarize our selection, we will keep subjects that have IQ tests based on Weschler scale and pass the quality check. Quality check summary is available here.Back to blog