EEG preprocessing II: eye-artifacts, repairing and rejecting

The second part of this series demonstrate additional preprocessing steps. Specifically, it addresses the problem of eye artifacts which are omnipresent in EEG recordings. It also demonstartes a procedure for repairing and rejecting noise-contaminated channels and segments.
Author

Ole Bialas

Published

Friday, the 23rd of February, 2024

The previous post on preprocessing EEG presented a minimally invasive pipeline of procedures that are necessary in most EEG analyses. In this post I present additional steps that might be useful if the data is still not sufficiently cleaned. First, I will address eye blinks which is one of the most prevalent sources of artifacts in EEG recordings. After that, I’ll demonstrate a method to repair or remove segments of the data contaminated with noise.

Prerequisites

First, we need to install some packages that provide us with the equired preprocessing functions:

pip install mne meegkit pyprep autoreject

Then, we have to dowload the sample data from MNE Python and clean it using the steps described in the previous post on EEG preprocessing. The code below does exactly that — if you want a more detailed explanation, read the original post. Running the cell below produces the cleaned epochs to which we will apply further preprocessing, starting with the removal of eye artifacts.

Code
import numpy as np
from mne.io import read_raw_fif
from mne.datasets.sample import data_path
from mne import find_events
from mne.epochs import Epochs
from meegkit.detrend import detrend
from meegkit.dss import dss_line
from pyprep.ransac import find_bad_by_ransac

raw = read_raw_fif(data_path() / "MEG/sample/sample_audvis_raw.fif")
events = find_events(raw)
raw.pick(picks="eeg")
raw, events = raw.resample(150, events=events)
X = raw.get_data().T  # transpose so the data is organized time-by-channels
X, _, _ = detrend(X, order=1)
X, _, _ = detrend(X, order=6)
raw._data = X.T  # overwrite raw data
X, noise = dss_line(X, fline=60, sfreq=raw.info["sfreq"], nremove=3)
raw._data = X.T
bads, _ = find_bad_by_ransac(
    data=raw.get_data(),
    sample_rate=raw.info["sfreq"],
    complete_chn_labs=np.asarray(raw.info["ch_names"]),
    chn_pos=np.stack([ch["loc"][0:3] for ch in raw.info["chs"]]),
    exclude=[],
    corr_thresh=0.9,
)
raw_clean = raw.copy()
raw_clean.info["bads"] = bads
raw_clean.interpolate_bads()
raw_clean.set_eeg_reference("average", projection=True)  # compute the reference
raw.add_proj(raw_clean.info["projs"][0])
del raw_clean  # delete the copy
raw.apply_proj()  # apply the reference
event_id = {"auditory/left": 1, "auditory/right": 2}
epochs = Epochs(raw, events, event_id, tmin=-0.1, tmax=0.4, baseline=None, preload=True)

What are eye artifacts?

While it is often assumed that eye artifacts are the result of muscle activity, they are actually the result of a ionic gradient in the retinal pigment epithelium that makes the eye an electric dipole 1. Thus, moving the eyes and the dipole induces a change in voltage picked up by the sensors that is roughly proportional to the amplitude of the movement. Because this could overshadow the neural responses, many studies eliminate eye movements by making participants fixate a point during the experiment.

However, another kind of eye artifact may still occur - blinks. Eye blinks affect the measured voltage because the eye lid changes the resistance between the positively charged cornea and the forehead. Fortunately, these eye artifacts are largely independent of each other and the brain activity which makes them ideal candidates for independent component analysis (ICA) 2.

Automated component rejection

One could simply select and remove the eye blink component from the data. However, manual selection of components goes against the idea of an automated preprocessing pipeline. Instead, we can use the selected component as a template and classify new components as blinks by using the corrmap algorithm which selects components who’s correlation with the template exceeds some threshold 6. To do this we can can store the blink component’s index in the ICA.labels_ attribute and save the ICA as template.

import tempfile
temp_dir = tempfile.TemporaryDirectory()
template = ica.copy()
template.labels_['blinks'] = [0]
template.save(temp_dir.name + '/template_ica.fif')
Writing ICA solution to /tmp/tmp08xo0rwc/template_ica.fif...
Method fastica
Fit parameters algorithm=parallel
fun=logcosh
fun_args=None
max_iter=1000
Fit 56 iterations on epochs (11020 samples)
ICA components 32
Available PCA components 59
Channel types eeg
ICA components marked for exclusion

Now we can iterate through all entries in the .labels_ attribute and use corrmap to find components that are similar to the respective template. The first input for corrmap is the list of ICAs being processed. The second input is a tuple with the index of the ICA instance in the list and the component of that ICA being used as template. Similar components that are detected are stored in the .labels_ attribute of the respective ICA instance.

from mne.preprocessing import read_ica, corrmap
template = read_ica(temp_dir.name + '/template_ica.fif')

for key, value in template.labels_.items():
    corrmap([template, ica], (0, value[0]), label=key, threshold=0.85, plot=False)
Reading /tmp/tmp08xo0rwc/template_ica.fif ...
    Read a total of 1 projection items:
        Average EEG reference (1 x 60) active
Now restoring ICA solution ...
Ready.
Median correlation with constructed map: 1.000
At least 1 IC detected for each subject.

Of course, this example is completely circular because we applied corrmap to the same data we used for selecting the template in the first place. However, once selected, the same template can be applied to multiple recorings and even across experiments, given that the electrode layout is the same. After all artifact components have been identified, we can exclude them when applying the ICA to the sensor data.

bad_components = [value[0] for value in ica.labels_.values()]
epochs.load_data() # make sure data is loaded
epochs = ica.apply(epochs, exclude=bad_components)
Applying ICA to Epochs instance
    Applying projection operator with 1 vector (pre-whitener application)
    Transforming to ICA space (32 components)
    Zeroing out 1 ICA component
    Projecting back using 59 PCA components

When data must be rejected

Even with all the preprocessing steps discussed in this guide, some data can’t be saved. Sometimes, a channels loses contact with the scalp or a segment is noise-ridden, for example due to excessive movement. In those cases, we have to remove that data so it won’t contaminate the average response. Traditionally, EEG data is manually inspected, bad channels are interpolated and bad segments are annotated for rejection by hand. This is suboptimal for several reasons: first, scanning tens of hours of EEG recordings is tedious, time consuming and unfeasible for very large data sets. What’s more, the manual approach reduces reproducibility because the criteria for what counts as a bad channel or segment are subjective. Finally, it is often not necessary to interpolate a channel for the entire recording if it is bad for only a fraction.

Introducing autoreject

All of these problems are addressed by the autoreject algorithm 7, which is a procedure to identify and either repair or reject bad data segments. For each channel p, it estimates a peak-to-peak threshold τ. Each channel marks epochs as bad that exceeds their respective threshold. A trial is rejected if a fraction κ of all channels marks it as bad. If less than κ channels are bad, up to ρ are interpolated to repair the epoch. All parameters, τ κ and ρ are estimated from the data using cross-validation. Thus the optimal set of parameters are those that minimize the difference between testing and training data. In this sense, autoreject acts similar to a human observer identifying outliers in the data. After installing the module with pip install autoreject, we can simply apply it to the epoched data. I also plot the rejection log to visualize the effect of autoreject on the data.

from autoreject import AutoReject

fig, ax = plt.subplots()
ar = AutoReject(verbose=False)
epochs, log = ar.fit_transform(epochs, return_log=True)
log.plot(orientation="horizontal", ax=ax);
Dropped 2 epochs: 32, 40

Matrix displaying the results of autoreject. Each column is one trial, and blue squares indicate interpolated sensors. Red columns were deemed “bad” and marked for rejection.

Matrix displaying the results of autoreject. Each column is one trial, and blue squares indicate interpolated sensors. Red columns were deemed “bad” and marked for rejection.

In the plot below, blue marks channels that have been interpolated within a given epoch. Red marks channels that have been deemed bad but not interpolated because the number of bad channels exceeded ρ. The red column at epoch 40 indicates that this epoch has been rejected because the number of bad channels exceeded κ.

Conclusion

The repertoire of preprocessing methods outlined in this and the previous post is sufficient to clean data for most EEG projects. Importantly, all steps can be assembled into a fully automated pipeline. In the next and final post in this series, I will share a such a pipeline and demonstrate a method for estimating the effectiveness of each step.

Footnotes

Back to top

Footnotes

  1. This is referred to as the corneo-retinal dipole. A explanation of the underlying physiology can be found in: Arden, G. B., & Constable, P. A. (2006). The electro-oculogram. Progress in retinal and eye research, 25(2), 207-248.↩︎

  2. A detailed investigation of eye-artifacts and their detection via ICA can be found in: Plöchl, M., Ossandón, J. P., & König, P. (2012). Combining EEG and eye tracking: identification, characterization, and correction of eye movement artifacts in electroencephalographic data. Frontiers in human neuroscience, 6, 278.↩︎

  3. An in-depth explanation of ICA is beyond the scope of this post but can be found in: Makeig, S., Bell, A., Jung, T. P., & Sejnowski, T. J. (1995). Independent component analysis of electroencephalographic data. Advances in neural information processing systems, 8.↩︎

  4. A highpass between 1 and 2 Hz before ICA is optimal, see Winkler, I., Debener, S., Müller, K. R., & Tangermann, M. (2015, August). On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP. In 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 4101-4105). IEEE.↩︎

  5. The absolute sign of the component is meaningless and may change when ICA is performed repeatedly.↩︎

  6. A detailed description of the corrmap algorithm can be found in Viola, F. C., Thorne, J., Edmonds, B., Schneider, T., Eichele, T., & Debener, S. (2009). Semi-automatic identification of independent components representing EEG artifact. Clinical Neurophysiology, 120(5), 868-877.↩︎

  7. A detailed description of the autoreject algorithm can be found in Jas, M., Engemann, D. A., Bekhti, Y., Raimondo, F., & Gramfort, A. (2017). Autoreject: Automated artifact rejection for MEG and EEG data. NeuroImage, 159, 417-429.↩︎