Tutorials (Python)

Run the tutorial in Colab:

This page describes the local pyDANT workflow. If you want to try pyDANT in a browser without local setup or pre-downloading data, use the Colab tutorial above. For local analysis, make sure pyDANT is installed correctly. If you have not installed pyDANT yet, please refer to the Installation section.

Prepare the data

An example dataset is available here. You can download it and follow the steps below to run the full example. You can also use your own dataset by following the same workflow.

To use pyDANT, organize your data in a folder with the following structure:

data_folder
├── channel_locations.npy
├── waveform_all.npy
├── session_index.npy
├── peth.npy (optional)
├── channel_shanks.npy (required for multi-shank data)
└── spike_times/
    ├── Unit0.npy
    ├── Unit1.npy
    ├── Unit2.npy
    ├── ...
    └── UnitN.npy

The data files should follow the formats below:

Filename	Shape	Explanation
`session_index.npy`	(n_unit,)	indicates the session. It should start from 1 (for compatibility with MATLAB) and be continuous without any gaps.
`waveform_all.npy`	(n_unit, n_channel, n_sample)	the mean waveform of each unit in μV. All units must share the same set of channels
`channel_locations.npy`	(n_channel, 2)	x and y coordinates of each channel in μm. The y coordinate typically represents depth
`channel_shanks.npy`	(n_channel,)	optional for single-shank data and required for multi-shank data. It stores the shank ID of each channel
`peth.npy`	(n_unit, n_point)	optional, peri-event time histogram for each unit
`spike_times/UnitX.npy`	(n_spike,)	spike times in milliseconds

If you include the optional peth.npy file, all units should have the same PETH length. If some event/time bins are missing for a unit, fill those bins with NaN.

For multi-shank probes, such as Neuropixels 2.0 probes with multiple shanks, provide channel_shanks.npy in the input data folder. During preprocessing, pyDANT generates unit_shanks.npy by assigning each unit to the shank of its peak channel.

Crucially, the waveforms used in this analysis must not be whitened, unlike the waveforms processed by Kilosort. Avoid direct use of waveforms from temp_wh.dat and refrain from using whitening_mat_inv.npy or whitening_mat.npy from Kilosort2.5 / Kilosort3 to “unwhiten” data. These matrices do not correspond to Kilosort’s original whitening process (see this issue).

We recommend analyzing data from different brain regions (e.g., cortex and striatum) separately, as they may exhibit distinct drifts and neuronal properties. Please generate a separate data folder for each brain region.

Copy settings.json and mainDANT.py from the pyDANT package into your data folder. For multi-shank data, also copy mainDANT_MultiShank.py. A typical layout is shown below:

data_folder
├── settings.json
├── mainDANT.py
├── mainDANT_MultiShank.py (for multi-shank data)
├── channel_locations.npy
├── waveform_all.npy
├── session_index.npy
├── peth.npy (optional)
├── channel_shanks.npy (required for multi-shank data)
└── spike_times/
    ├── Unit0.npy
    ├── Unit1.npy
    ├── Unit2.npy
    ├── ...
    └── UnitN.npy

Edit the settings

To run pyDANT, edit the settings.json file in your data folder first. At a minimum, you should specify the following fields:

{
    "path_to_data": ".", // path to the input data folder
    "output_folder": ".\\DANT_Output", // output folder
}

If you do not want to use the PETH feature, remove it from both the motionEstimation and clustering sections. After editing, the settings can look like this:

// parameters for motion estimation
"motionEstimation":{
    "features": [
        ["Waveform", "AutoCorr"]
    ], // features used for motion estimation each iteration. Choose from "Waveform", "AutoCorr", "ISI", "PETH"
    "max_iter": 15, // maximum number of motion estimation iterations
    "repeat_last_feature_set": true, // whether to keep reusing the last feature set until stop_early triggers or max_iter is reached
    "stop_early": true // whether to terminate the motion estimation loop early if the number of matched unit pairs fails to increase
},

and

// parameters for clustering
"clustering":{
    "max_distance": 100, // um. Unit pairs with distance larger than this value in Y direction will be considered as different clusters
    "features": ["Waveform", "AutoCorr"], // features used for final clustering. Choose from "Waveform", "AutoCorr", "ISI", "PETH"
    "n_iter": 10 // number of iterations for the clustering algorithm
},

Also edit mainDANT.py to specify the path to the settings file:

path_settings = r'./settings.json' # Path to your settings.json file

To learn more about the settings, please refer to the Change default settings section. Careful tuning can help improve tracking results.

Run the code

Run mainDANT.py in your Python environment from the terminal or command prompt:

python mainDANT.py

For multi-shank data, run mainDANT_MultiShank.py instead:

python mainDANT_MultiShank.py

The multi-shank entry point calls runDANTMultiShank(user_settings). It processes each shank separately and then merges the per-shank results into the root output folder.

The tracking results should appear in the output folder specified in settings.json. A typical output layout looks like this:

data_folder
├── settings.json
├── mainDANT.py
├── mainDANT_MultiShank.py (for multi-shank data)
├── channel_locations.npy
├── waveform_all.npy
├── session_index.npy
├── peth.npy (optional)
├── channel_shanks.npy (required for multi-shank data)
├── spike_times/
└── DANT_Output/
    ├── IdxCluster.npy
    ├── ClusterMatrix.npy
    ├── SimilarityMatrix.npy
    ├── ...
    └── Figures/

For multi-shank runs, the root output folder also contains unit_shanks.npy and merged global outputs such as Output.npz, ClusteringResults.npz, ClusterMatrix.npy, IdxCluster.npy, MatchedPairs.npy, the similarity matrices, and waveforms_corrected.npy. The merged Output.npz includes IdxUnit and IdxShank fields. Complete per-shank outputs are saved under Shank<ID>/ subfolders.

Understand the output

Along with several intermediate files, the main output is stored in the DANT_Output folder. The most important files are listed below:

Field name	Shape	Explanation
`IdxCluster.npy`	(n_unit,)	cluster index for each unit.
`ClusterMatrix.npy`	(n_unit x n_unit)	cluster assignment matrix. `ClusterMatrix(i,j) = 1` means unit `i` and `j` are in the same cluster.
`MatchedPairs`	(n_pairs x 2)	unit index for all matched pairs.
`SimilarityMatrix`	(n_unit x n_unit)	weighted sum of the similarity between each pair of units.

The most important file is IdxCluster.npy, which assigns a unique cluster ID to each unit (-1 for non-matched units). You can use it to extract matched units across sessions. To learn more about the output, please refer to the Input and Output section.

Tracking is complete. You can now move on to cross-session analysis with the tracked neurons.