MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: Reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths

February 20, 2009Jovicich J, Czanner S, Han X, Salat D, van der Kouwe A, Quinn B, Pacheco J, Albert M, Killiany R, Blacker D, Maguire P, Rosas D, Makris N, Gollub R, Dale A, Dickerson BC, Fischl BResearch Area, Publications, Other

Jovicich J, Czanner S, Han X, Salat D, van der Kouwe A, Quinn B, Pacheco J, Albert M, Killiany R, Blacker D, Maguire P, Rosas D, Makris N, Gollub R, Dale A, Dickerson BC, Fischl B

Neuroimage 2009 May;46(1):177-92

PMID: 19233293

Abstract

Automated MRI-derived measurements of in-vivo human brain volumes provide novel insights into normal and abnormal neuroanatomy, but little is known about measurement reliability. Here we assess the impact of image acquisition variables (scan session, MRI sequence, scanner upgrade, vendor and field strengths), FreeSurfer segmentation pre-processing variables (image averaging, B1 field inhomogeneity correction) and segmentation analysis variables (probabilistic atlas) on resultant image segmentation volumes from older (n=15, mean age 69.5) and younger (both n=5, mean ages 34 and 36.5) healthy subjects. The variability between hippocampal, thalamic, caudate, putamen, lateral ventricular and total intracranial volume measures across sessions on the same scanner on different days is less than 4.3% for the older group and less than 2.3% for the younger group. Within-scanner measurements are remarkably reliable across scan sessions, being minimally affected by averaging of multiple acquisitions, B1 correction, acquisition sequence (MPRAGE vs. multi-echo-FLASH), major scanner upgrades (Sonata-Avanto, Trio-TrioTIM), and segmentation atlas (MPRAGE or multi-echo-FLASH). Volume measurements across platforms (Siemens Sonata vs. GE Signa) and field strengths (1.5 T vs. 3 T) result in a volume difference bias but with a comparable variance as that measured within-scanner, implying that multi-site studies may not necessarily require a much larger sample to detect a specific effect. These results suggest that volumes derived from automated segmentation of T1-weighted structural images are reliable measures within the same scanner platform, even after upgrades; however, combining data across platform and across field-strength introduces a bias that should be considered in the design of multi-site studies, such as clinical drug trials. The results derived from the young groups (scanner upgrade effects and B1 inhomogeneity correction effects) should be considered as preliminary and in need for further validation with a larger dataset.