International Journal of Intelligent Systems

Volume 2025, Issue 1 7026120

Research Article

Open Access

Neuron Segmentation via a Frequency and Spatial Domain–Integrated Encoder–Decoder Network

Haixing Song

orcid.org/0009-0001-8897-4947

Guangxi Key Laboratory of Brain-Inspired Computing and Intelligent Chips , Department of Artificial Intelligence , School of Electronic and Information Engineering , Guangxi Normal University , Guilin , China , gxnu.edu.cn

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author

Xuqing Zeng,

Xuqing Zeng

orcid.org/0009-0006-5825-0143

Search for more papers by this author

Guanglian Li,

Guanglian Li

orcid.org/0009-0001-7501-9481

Search for more papers by this author

Rongqing Wu,

Rongqing Wu

orcid.org/0009-0003-1446-9002

Search for more papers by this author

Simin Liu,

Simin Liu

orcid.org/0009-0009-1765-1575

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author

Fuyun He,

Corresponding Author

Fuyun He

[email protected]

orcid.org/0000-0001-9798-8673

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author

Haixing Song,

Haixing Song

orcid.org/0009-0001-8897-4947

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author

Xuqing Zeng,

Xuqing Zeng

orcid.org/0009-0006-5825-0143

Search for more papers by this author

Guanglian Li,

Guanglian Li

orcid.org/0009-0001-7501-9481

Search for more papers by this author

Rongqing Wu,

Rongqing Wu

orcid.org/0009-0003-1446-9002

Search for more papers by this author

Simin Liu,

Simin Liu

orcid.org/0009-0009-1765-1575

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author

Fuyun He,

Corresponding Author

Fuyun He

[email protected]

orcid.org/0000-0001-9798-8673

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author

First published: 17 February 2025

https://doi.org/10.1155/int/7026120

Academic Editor: Eugenio Vocaturo

Share a link

Email
Wechat
Bluesky

Abstract

Three-dimensional (3D) segmentation of neurons is a crucial step in the digital reconstruction of neurons and serves as an important foundation for brain science research. In neuron segmentation, the U-Net and its variants have showed promising results. However, due to their primary focus on learning spatial domain features, these methods overlook the abundant global information in the frequency domain. Furthermore, issues such as insufficient processing of contextual features by skip connections and redundant features resulting from simple channel concatenation in the decoder lead to limitations in accurately segmenting neuronal fiber structures. To address these problems, we propose an encoder–decoder segmentation network integrating frequency domain and spatial domain to enhance neuron reconstruction. To simplify the segmentation task, we first divide the neuron images into neuronal cubes. Then, we design 3D FregSNet, which leverages both frequency and spatial domain features to segment the target neurons within these cubes. Then, we introduce a multiscale attention fusion module (MAFM) that utilizes spatial and channel position information to enhance contextual feature representation. In addition, a feature selection module (FSM) is incorporated to adaptively select discriminative features from both the encoder and decoder, increasing the weight on critical neuron locations and significantly improving segmentation performance. Finally, the segmented nerve fiber cubes were assembled into complete neurons and digitally reconstructed using available neuron tracking algorithms. In experiments, we evaluated 3D FregSNet on two challenging 3D neuron image datasets (the BigNeuron dataset and the CWMBS dataset). Compared to other advanced segmentation methods, 3D FregSNet demonstrates more accurate extraction of target neurons in noisy and weakly visible neuronal fiber images, effectively improving the performance of 3D neuron segmentation and reconstruction.

1. Introduction

Neuronal morphology plays a significant role in the field of neuroscience. 3D digital reconstruction of neurons aims to establish tree-like models of 3D neuronal morphology using optical microscopy images, which is crucial for understanding the mechanisms of the nervous system and the functions of the brain [1]. However, due to the limitations of current imaging technologies, the quality of the generated 3D brain neuron images varies significantly. As shown in Figure 1, intense background noise and weak neuron signals pose significant challenges to the existing automatic or semiautomatic brain neuron reconstruction algorithms [2–7], making it difficult to achieve accurate neuron reconstruction results. Neuron segmentation, as a preprocessing step, can effectively and automatically denoise and enhance weak signals, making the entire tracking process easier and more precise. In the early stages, some traditional neuron segmentation algorithms were adopted, such as threshold segmentation [8], maximum likelihood global trees [9], 3D tubular models [10], and data-driven methods that encode context-tagged information [11]. However, these methods often require parameter tuning and generally only work well for images with high contrast and clear backgrounds, rendering them a challenge to apply to intricate neuron images. In recent years, deep learning–based methods have gained significant traction in the field of neuron image segmentation. Among them, convolutional neural networks (CNNs) are particularly representative, with 3D U-Net [12] and its variants [13–15] being the most notable examples. However, these advanced algorithms still struggle to precisely extract neuron information from challenging images. First, current deep learning methods typically only extract semantic features in the spatial domain, exhibiting limitations in learning global contextual information. Frequency domain features encapsulate crucial global information about the image. Combining these frequency domain features with spatial domain characteristics can yield a richer, more comprehensive feature representation, thereby enhancing the accuracy of neuron segmentation. Then, by introducing skip connections, the U-Net network can effectively merge low-level and high-level image features. However, this native approach and the constraints of the receptive field can easily result in insufficient processing of multiscale contextual features. Furthermore, at the end of the skip connections, where the feature maps of the decoder and the symmetrical encoder are concatenated and combined, using simple channel concatenation for feature fusion can lead to issues such as feature redundancy and excessive computational overhead.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Examples of volume microscopy images and neuron reconstruction results. (a) Original human neuron image with strong background noise and weak filament signals. (b) The manual reconstruction of the neuron. (c) Reconstruction results of the APP2 [2] tracing method, which misses many filaments with weak signals. (d) Reconstruction results of the MOST [3] tracing method, which traces many redundant neuronal structures.

To address the issues mentioned above, we propose a novel 3D segmentation network designed to leverage frequency and spatial features for processing image blocks with varying signal and noise intensities, thereby efficiently and accurately extracting neuronal structures for neuron reconstruction. Specifically, to effectively integrate frequency and spatial features and extract rich frequency-spatial information, we propose the frequency-spatial representation aggregation block (FSRA) in the encoder stage of the network. The FSRA block employs 3D fast Fourier transform (FFT) and 3D convolution to fully integrate frequency and spatial domain information. Meanwhile, in the skip connection stage of the network, we propose the multiscale attention fusion module (MAFM), which can better enhance the network’s ability to learn multiscale contexts and address the imbalance in feature map distribution within the 3D space. Furthermore, since not all features are effectively applicable to extract neuronal structures, we utilize a feature selection module (FSM) to recover features from the decoder and adaptively select effective features from the encoded features of the encoder, thereby capturing more details of neuronal structures. Finally, a mixed loss of Dice and cross-entropy is adopted to address the issue of positive–negative imbalance in neuron image samples. In summary, the main contributions of this work are as follows:

1.
We proposed a neuron segmentation method based on an encoder–decoder network that utilizes both frequency and spatial domains, effectively combining frequency and spatial information for the segmentation of large-scale neuronal images.
2.
We proposed the FSRA, which performs feature learning in both the frequency and spatial domains, preserving the ability to learn global information while maintaining the integrity of local information.
3.
We proposed the MAFM and FSM. The former enhances contextual feature representation by utilizing spatial and channel position information, while the latter adaptively selects features that significantly increase the weight of key neuronal voxels to avoid feature redundancy.
4.
Extensive experiments on two 3D neuronal datasets demonstrated that our segmentation results achieve state-of-the-art performance across five different neuron reconstruction algorithms, striking a balance between computational cost and reconstruction performance.

2. Related Works

2.1. 3D Neuronal Reconstruction

Neuronal morphology reconstruction, also known as neuronal tracing, aims to extract quantitative data on neuronal morphology from 3D microscopic images of neurons and establish a 3D morphological structure model of neurons. This process involves identifying all nodes of neural fibers, establishing the topological structure between these points, and measuring the radius information of all nodes. In recent years, numerous experts and scholars in neuron-related fields have developed various automatic or semiautomatic neuronal tracing techniques. For instance, Peng et al. developed APP [16] and APP2 [2], Zhou et al. introduced TReMAP [17], Chen et al. presented SmartTracing [18], Zhao et al. proposed Tube Models [19], Ming et al. developed MOST [3], and the BigNeuron project [20] compiled dozens of tracing algorithms and integrated them into the Vaa3D software [21]. In general, these algorithms leverage graph theory principles to mathematical model neuronal morphology and neural fibers. However, they face two major challenges in practical applications: firstly, they may erroneously identify background noise as neural fibers; secondly, they may overlook a significant number of genuinely existing neural fibers, as illustrated in Figure 1. Despite these tracing algorithms’ high sensitivity to image quality, they often confront a low signal-to-noise ratio due to limitations in current neuronal imaging technologies. Therefore, there is an urgent need for an automatic and rapid algorithm that can reduce noise influence, enhance weak neuronal structures, and ultimately improve the performance of neuronal reconstruction.

2.2. Deep Learning–Based Neuron Segmentation Methods

Before performing neuron reconstruction, implementing accurate neuron segmentation is an effective strategy to enhance the performance of neuron tracing algorithms. In recent years, deep learning–based methods have demonstrated superior performance in medical image analysis without the need for manual feature extraction. Deep learning has also been introduced into neuron image segmentation. For example, Liu et al. [22] proposed a novel 2.5D framework that utilizes a stack of adjacent 2D slices in a 3D image as the input to train a 2D CNN, effectively mitigating substantial background noise in neuron images. Nevertheless, this approach does not take full advantage of the depth information in the 3D image due to the use of 2D convolutions, potentially leading to inaccurate segmentation results. Li et al. [23] introduced the first 3D residual deep network for neuron segmentation to improve 3D neuron reconstruction performance. Liu et al. [13] further enhanced the neuron structure and removed image noise by improving upon the V-Net architecture. Nevertheless, due to the significant imbalance between foreground and background classes in neuron images, this network fails to effectively capture foreground features. To mitigate the aliasing effects caused by downsampling, Li and Shen [14] designed a 3D wavelet integration network to remove data noise. However, suppressing high-frequency information at each downsampling stage leads to excessive loss of edge details. The anisotropy of neuron images results in incomplete preservation of spatial information, causing the loss of some fine neuronal structures. To address this issue, Yang et al. [24] first proposed a two-stage 3D neuron segmentation method. In the first stage, a FCN is trained to obtain a neuron image segmentation map, which is then used to repair broken structures using a Hessian-repair model based on the segmentation results. However, this repair method is complex and time-consuming, and it cannot effectively repair line segments that are too far apart. To achieve clear edge segmentation in vessel images, Xia et al. [25] enhanced the weights of edge voxels through the reverse edge attention block and edge optimization loss. Liu et al. [26] proposed a novel adaptive learning network that utilizes classification results to guide neuron segmentation, reducing model computational complexity and enabling real-time segmentation. However, the advantages of this method often come at the cost of sacrificing some model accuracy. Furthermore, existing neuron segmentation algorithms primarily focus on acquiring spatial domain information while neglecting the importance of frequency domain information. In general computer vision applications, it has been demonstrated that extracting features in the frequency domain is a powerful approach [27, 28]. For instance, GFNet utilizes 2D discrete Fourier transform to replace the self-attention mechanism in ViT [29], employing global filtering layers in the frequency domain to facilitate information exchange between tokens, thereby more effectively capturing long-range dependencies. However, the frequency domain remains underexplored in 3D semantic segmentation. In the spatial domain, the boundaries between segmentation objects and backgrounds are often blurred, whereas in the frequency domain, objects located at different frequencies can be easily distinguished. Inspired by this, we attempted to leverage frequency domain information to improve the performance of 3D neuron image segmentation tasks.

To provide a concise and to-the-point overview, Table 1 lists the usage methods of related work in a tabular form. This table emphasizes the advantages and disadvantages of each method compared to our current method and indicates whether the method belongs to the frequency domain or spatial domain.

Table 1. Summary of related works.

References	Methodology used	Advantages	Disadvantages	Frequency or spatial domain
[22]	2.5D CNN	Easy to train and low computational complexity	Ignoring 3D spatial information	Spatial domain
[23]	3D CNNs	The first 3D residual deep network for neuron segmentation	Inability to effectively segment complex neuron images	Spatial domain
[13]	The improved V-Net	Using anisotropic convolutional kernels and varying the number of layers to adapt to neuronal datasets	The network is unable to capture foreground features efficiently	Spatial domain
[14]	3D WaveUNet	The first 3D wavelet integrated network	Suppressing high-frequency information causes the network to excessively lose edge details	Spatial domain
[24]	FCN, a ray-shooting model and a Hessian-repair model	Effectively applied to challenging datasets contaminated by noise or containing weak filament signals	The repair method is complicated, time-consuming, and difficult to apply to line segments with excessively large distances.	Spatial domain
[25]	ER-Net	Being able to effectively extract spatial edge information	The overall performance is not high, and the computational load is heavy	Spatial domain
[26]	ADTL-Net	Short inference time	Sacrificing model accuracy	Spatial domain
[27]	Detecting camouflaged object in frequency domain	Introducing frequency domain perception cues into CNN models	Frequency domain unexplored in 3D	Frequency domain
[28]	GFNet	2D DFT replaces the self-attention mechanism, which is simple and computationally efficient	Frequency domain unexplored in 3D	Frequency domain
[30]	SwinUNETR	Excellent global information capture capability	High computational complexity	Spatial domain
[31]	3D UX-Net	Large volumetric convolutional kernel enhances capture of spatial information	High computational resource requirements and relative complexity of the training process	Spatial domain

3. Methods

3.1. The Method Pipeline

In this study, we developed a pipeline of our 3D neuron segmentation method, as detailed in Figure 2. This method comprises four crucial steps. Specifically, A is partition, where we cut the neuron image into a number of small cubes. The long nerve fibers of neurons may be widely dispersed over larger brain regions, which results in high computational cost when performing 3D neuron segmentation in large-scale images. To reduce this complexity, we adopted a strategy to simplify the segmentation process by cutting the neuron image into a number of small cubes. The dimensions of these cubes were set to be 32 for depth (z-axis), 128 for height (y-axis), and 128 for width (x-axis). B is segmentation, where we use a trained segmentation method to segment the neuronal cubes. C is assembling, where we integrate the segmented neural fibers based on their spatial positions within the neuron image, ultimately achieving the task of 3D neuron segmentation. D is reconstruction, where we utilize the segmented neuron images as a foundation and apply five available tracing algorithms for neuron reconstruction.

3.2. Overall Structure

The overall architecture of the proposed segmentation method is shown in Figure 3. This network employs a symmetrical encoder–decoder framework. In the encoder stage, the first three layers consist of a frequency-spatial representation aggregation block followed by a standard convolution with a kernel size of 2 and a stride of 2. After each encoder layer, the feature dimension doubles, while the resolution halves. The skip connections utilized a MAFM to perform cross-scale spatial-channel feature learning, transferring rich contextual information to the decoder. The decoder comprises three layers, each consisting of a FSM and a residual block [32]. Subsequently, a deconvolution with a kernel size of 2 and a stride of 2 is applied for upsampling. Contrary to the encoder, the channel dimension of the feature maps halves after each decoder layer, while the resolution of the feature maps doubles.

3.3. Frequency Space Representation Aggregation Block

For the model, mining more frequency domain information can enhance the discrimination between categories, making the boundaries between each category clearer, thereby improving the effectiveness of semantic segmentation. The FFT plays a crucial role in frequency-based image information analysis. Even for low-quality X-rays, fine details and contours can be captured in the frequency domain [33]. Convolutional layers possess a strong bias toward texture induction, enabling them to learn texture-based features. Representing objects using frequency characteristics can reduce the impact of texture bias. Specifically, high-frequency components primarily carry texture information such as boundary details, while low-frequency components are more related to the overall shape and structure of the object. Therefore, utilizing frequency characteristics for object representation can combine shape and texture information contained in different frequency components. This facilitates better separation of objects of interest and aiding in more precise analysis and understanding of image texture and shape information. Inspired by this, some signals or noises in neuron images that cannot be effectively extracted or filtered in the spatial domain can be processed in the frequency domain. Unlike CNNs and ViTs, the proposed FSRA filters images from a frequency domain perspective, with filtering operations being global and linear. As shown in Figure 4, the FSRA block first feeds the feature maps into two separate branches simultaneously. The frequency domain branch transforms spatial domain features into frequency domain features through the FFT and employs filters with learnable weight parameters to determine the weight of each frequency component, thereby accurately capturing the lines and edges of the image. The filters can automatically update their parameters through back-propagation during network training, enabling adaptive feature learning from a global perspective. After these steps, the feature maps are converted back to the spatial domain through inverse FFT to obtain the processed signals. Then, the local features captured by the convolutional layer are effectively combined with the global features extracted by the FFT layer to achieve a more comprehensive feature representation. Finally, an element-wise addition operation is performed between the concatenated feature maps and the original feature maps, further breaking the symmetry of the network, which contributes to enhance its expressive ability. The entire process can be described as follows:

()

where

denotes the combination of the batch normalization (BN) and the activation function ReLu. ⊗ denotes the convolution operation, ⊕denotes the element summation, and K denotes the convolution kernel, where K₁ ∈ R^3×3×3 and K₂ ∈ R^1×1×1. The FSRA, by integrating features from both the frequency and spatial domains, augments the network’s feature extraction capability, rendering it more robust to neuronal structures of varying scales. This approach mitigates the influence of noise and blurred edges inherent in 3D neuronal images, thereby facilitating the model’s ability to exactly extract neuronal architectures from diverse and complex scenarios.

3.4. MAFM

To address the issues of inadequate contextual feature processing and receptive field limitations, we proposed a MAFM to optimize the original skip connections for effective multiscale feature representation. By combining convolutional layers with different kernel sizes, various receptive fields can be obtained for multiscale feature representation. However, a significant drawback of this method is that large convolutional kernels can significantly increase computational costs, thereby reducing computational efficiency. Inspired by the ASPP block [34], convolutional layers with different dilation rates can efficiently capture different scales of receptive fields, providing an efficient and economical means for multiscale feature representation. Due to the limitations of 3D imaging technology, axial undersampling may occur during the imaging process, resulting in a lower resolution in the axial (z-direction) than in the transverse (x and y directions) for 3D neuronal microscope images. The anisotropic spatial resolution of 3D images affects the network’s extraction and representation of image features. To mitigate this issue, anisotropic convolutional kernels are employed to extract information from multiple different receptive fields. Consequently, we use three dilated convolutional kernels of 3 × 3 × 1 with dilation rates of 1, 6, and 12 to expand the receptive field and learn different spatial information. The structure of the MAFM is shown in Figure 5. Low-level features are rich in spatial details, while high-level features possess richer semantic information. To obtain more contextual information, the low-level feature maps and high-level feature maps are first concatenated and fed into a channel attention module to capture the importance of each channel for the final task. Subsequently, feature maps extracted using different sizes of anisotropic convolutional kernels are combined to effectively fuse spatial information and enhance the network’s spatial perception capabilities. Then, 1 × 1 × 1 convolutions are utilized to reduce the channel dimensions of the concatenated feature maps, compressing the multichannel dimensions and extracting key features. Finally, the learned spatial-channel weight vectors are multiplied by the original feature maps to focus on different regions of the input feature maps, thereby optimizing the feature representation and enhancing the performance of neuronal image semantic segmentation tasks. The computational process of the MAFM is shown in the following equation.

()

where X_c ∈ R^C×H×W×D denotes the feature mapping after channel attention processing and M_s,i denotes the output after anisotropic convolutional kernel operation of different sizes. A_s is the output of spatial information fusion. X_sc is the output after combining spatial information of channels. X_sc is subjected to sigmoid transformation to obtain a weighted mapping N ∈ R^C×H×W×D. The final feature map output is

, as shown in the following equation.

()

where N_i represents the ith channel of N. The MAFM is able to obtain rich contextual features by effectively integrating feature information from different levels, preserving abundant details and semantic information at each stage. These comprehensive features play an important guiding role in the subsequent decoding process.

3.5. FSM

In popular encoder–decoder architectures such as U-Net [12] and V-Net [13], skip connections are commonly used to combine feature maps from encoder layers with those from symmetric positions in the decoder layers. However, simply fusing features through channel concatenation may lead to feature redundancy, as not all features effectively capture the critical information in neuronal images. Therefore, we introduced a novel FSM. The detailed structure of FSM is shown in Figure 6. Each FSM has two inputs: the feature map F_i from the previous decoder layer and the feature map F_i+1 from the current corresponding skip connection output. Initially, F_i and F_i+1 undergo element-wise summation for initial feature fusion, and then the feature flow is split into two independent branches. The first branch employs a pointwise convolutional layer to reduce feature dimensionality and perform fine-grained feature extraction. In contrast, the second branch adopts a different strategy, applying global average pooling to the input to capture global channel context. The calculation of local channel context L(F_i+1) is given by equation (7). By combining the given global channel context G(F_i) and local channel context L(F_i+1), and then passing it through a sigmoid activation function, we obtain multiscale channel attention weights M(F_i) as defined in equation (8). Finally, a weighted average of F_i and F_i+1 is taken to obtain the final fused output feature, as defined in equation (9).

()

where i ∈ {1, 2, 3}, B denotes the batchnorm layer, and δ denotes the ReLU activation function. FSM focuses on the scale issue of channels through point-wise convolution, rather than using convolutional kernels of different sizes. Point-wise convolution enables FSM to be as lightweight as possible. The dashed line in Figure 6 represents 1 − M (F_i ⊕ F_i+1), which implements a soft selection mechanism by subtracting this set of fused weights from 1. This mechanism can flexibly adjust weights adaptively based on the importance of features, thereby retaining more valuable information during feature fusion. This approach not only avoids feature redundancy but also enhances the robustness and generalization ability of the model.

3.6. Loss Function

The binary cross-entropy (BCE) loss function is a widely used loss function in classification problems. It quantifies the error by independently evaluating the category predictions for each voxel vector and then calculating the average over all voxels. However, in the task of 3D neuronal microscopic image segmentation, the number of voxels of each category within the image often exhibits significant imbalance, with foreground voxels accounting for only a small portion of the image stack. This can lead to the model being biased toward predicting background voxels during training. To mitigate the issues caused by this class imbalance, we introduced the BCE-Dice hybrid loss function. The BCE loss helps the model learn discriminative features, thereby enhancing its classification capabilities. The Dice loss focuses on measuring the similarity between the segmentation results generated by the model and the true segmentation labels, with particular emphasis on maintaining spatial continuity. Through the organic combination of these two losses, the model can learn richer feature representations, thereby demonstrating better overall performance in semantic segmentation tasks. The definitions of the BCE loss and the Dice loss are shown in the following equations.

()

where y_i denotes the true label of the ith voxel in the sample and

denotes the model predicted probability of the ith voxel. In equation (12), a very small constant ε is added to prevent the denominator from being zero. The overall loss function of the model is shown in the following equation.

()

The weights of the two loss functions can be controlled by adjusting the value of α in order to achieve the best prediction results.

4. Experiment and Result

4.1. Datasets

1.
BigNeuron Dataset: To assess the performance of the presented model, we utilize the gold-166 dataset obtained from the BigNeuron project [20], which encompasses a variety of species such as fish, humans, and mice. The dimensions of the image stacks exhibit variability, such as 2047 × 890 × 23, 724 × 1024 × 33, and 1018 × 543 × 29. To validate the proficiency of our proposed model, we selected 68 neuron images of different types, randomly using 3/4 of them for training and 1/4 for testing.
2.
Complex Whole Mouse Brain Sub-Image (CWMBS): In contrast to the BigNeuron dataset, the CWMBS [25] dataset poses a greater challenge due to the notable variations in voxel characteristics across each image. Comprising 83 images with complex background noise and 162 images featuring intricate filamentous structures, all adhering to the dimensions of 256 × 256 × 256 voxels and a spatial resolution of 0.2 × 0.2 × 1.0 μm/voxel. These images are derived from a complete mouse brain and are professionally labeled. We selected 20 images with cluttered background noise and 20 images with fine fibrous structures as the test set, while the rest are used as the training set.
3.
Data Labeling: When training deep learning segmentation networks, a large amount of accurate data labels can provide correct guidance to the network. Since the brain neuron dataset used provides expert manually reconstructed results corresponding to each neuron image, where each set of data in the SWC file contains five attributes of the neuron, namely, index, type, location (x, y, and z coordinates), radius, and parent node index [35]. Therefore, one can use Euclidean distance transform (EDT) and these neuron information to automatically generate segmentation labels for each neuron image. The value of each voxel in EDT represents the shortest geometric distance to the target point set, which is defined as shown in the following equation.
()
where (x₁, y₁, z₁) denotes the 3D coordinates of a voxel in the neuron image, C denotes the set of target points formed by the points on the neuron centerline, and (x₂, y₂, z₂) denotes the 3D coordinates of the voxels in that set. The Euclidean distance is defined as shown in the following equation.
()

Mark points in the EDT with distance values less than the radius of the neuron structure as 1 and mark the rest as 0, as shown in equation (15). Here, p and r, respectively, represent the 3D coordinates of the voxel and the radius of the neuron structure at that point, and T(P) represents the shortest distance from each voxel p to the centerline of the ground truth reconstruction. Therefore, each voxel in the neuron image is classified as background (0) or neurite (1).

()

4.2. Implementation Details

In the experiment, due to significant differences in image size, resolution, signal-to-noise ratio, and neuron morphology between the two datasets, they were trained and tested separately. Furthermore, owing to limitations in computational resources and variations in image dimensions, during the training phase, neuron images were cropped into nonoverlapping 128 × 128 × 32 blocks. Data augmentation was achieved through methods such as random flipping in the X–Y plane, random translation, and noise addition. In the testing phase, the entire test image of arbitrary size was used as input. The model was implemented on an Intel Core CPU and the Pytorch framework with a 24-GB Nvidia RTX 3090 GPU. During training, the SGD optimizer was used for network optimization, with an initial learning rate of 0.1, a weight decay of 0.0005, and a training cycle of 50 epochs. To alleviate the issue of sample imbalance in 3D neuron images, we employed a weighted combination of BCE and Dice loss. To investigate the impact of different α values on the training network, we set different α values on the BigNeuron dataset for network training and testing and used APP2 to reconstruct neurons from segmented images. Then, we evaluated the network’s performance by calculating segmentation metrics such as Precision and IoU, as well as neuron reconstruction metrics including ESA, DSA, and ADS. As shown in Figure 7, when α = 0.4, the network achieved the best overall performance and the most accurate prediction results. Therefore, in subsequent experiments, we set α value to 0.4.

4.3. Evaluation Metrics

To more comprehensively evaluate the performance of the model, we separately measured the segmentation results and neuron reconstruction results. We used the following three metrics for quantitative analysis of segmentation performance: recall, precision, and IoU. The definitions of these three metrics are shown in Equations (16)–(18). Higher values indicate better neuron segmentation performance.

()

Meanwhile, we employed three distance scores to evaluate the discrepancy between the generated reconstruction and the reference reconstruction, namely, the entire structure average (ESA), the different structure average (DSA), and the percentage of different structures (ADS), which are defined in [36]. Specifically, they are calculated in the following ways. For each node in the manual reconstruction, we computed the minimum spatial distance between that node and all nodes in the reconstruction generated by the computational method. The ESA is obtained by averaging all these reciprocal minimum spatial distances across the entire structure. Since it is difficult to visually distinguish nodes separated by less than 2 voxels, the DSA is derived by summing the node pairs with distances greater than 2 voxels between the generated and manual reconstructions of neurons and computing the average. The ADS is obtained by calculating the proportion of node pairs with reciprocal minimum spatial distances greater than 2 voxels among all node pairs. These three evaluation metrics reflect the correctness of neuronal node connections, and the specific values can be generated by the Vaa3D software plugin [21], which directly provides three distance information (ESA, DSA, and ADS) by comparing the computed neuronal reconstruction results (generated SWC file) with the standard manual reconstruction results (standard SWC file). Lower values indicate smaller discrepancies between the generated neuronal reconstruction results and the standard manual reconstruction results, suggesting better neuronal reconstruction performance.

4.4. Evaluation on Images From the BigNeuron Dataset

To comprehensively evaluate the effectiveness of our proposed segmentation method in enhancing the performance of reconstruction algorithms, we conducted extensive comparative experiments. In these comparisons, our method was not only benchmarked against a series of advanced neuronal segmentation methods, including 3D U-Net (M1) [12], 3D Wave-Net (M2) [14], ER-Net (M3) [24], and ADTL-Net (M4) [25] but also against SwinUNETR (M5) [30] and 3D UX-Net (M7) [31], which are currently performing exceptionally well in the field of 3D image processing based on transformer architecture. These comprehensive comparative experiments aimed to validate the effectiveness and superiority of the segmentation model.

Table 2 shows the segmentation performance of different segmentation methods. It is directly evident from the segmentation test results that 3D FreqSNet outperforms the compared segmentation methods on most metrics. Furthermore, after segmenting the neurons within the cubes, we assemble the cubes into a complete 3D neuronal image. Subsequently, we employed the APP2 algorithm to reconstruct the assembled neurons, thereby generating automatically reconstructed SWC files. Then, we calculated the differences between these automatically reconstructed SWC files and manually reconstructed ones using the Vaa3D software plugin [21], obtaining ESA, DSA, and PDS values to assess the accuracy of the automatic reconstruction. Table 3 shows the three average metrics for 17 test neurons reconstructed using the APP2 algorithm on the segmentation results of different segmentation methods. It can be observed that 3D FreqSNet achieves the best results on the reconstruction metrics ESA and ADS. Although 3D UX-Net obtains relatively good results in PDS, it falls far behind the proposed method in ESA and ADS metrics. This is because 3D UX-Net generates erroneous segmentations due to its self-attention mechanism’s excessive focus on noise. Figure 8 intuitively demonstrates the advantages of 3D FreqSNet. Figures 8(a1) and 8(b1) show neurons with strong background noise, and it can be seen in Figures 8(a2) and 8(b2) that 3D FreqSNet effectively removes the noise. Figures 8(b1) and 8(c1) are neurons with weak filament signals, and it can be seen in Figure 8(b2) and 8(c2) that 3D FreqSNet effectively enhances the weak neuronal signals. 3D FreqSNet significantly improves the segmentation quality of BigNeuron images and enhances neuron reconstruction results. Figure 9 visualizes the reconstruction results generated by various segmentation methods and the ground truth reconstruction. 3D FreqSNet obtains the most accurate and complete reconstruction results among all compared methods, while other methods struggle to obtain precise reconstruction effects, often missing fine neuronal fiber structures (a5 and c4 in Figure 9). The reconstruction results of three complex neuronal images are basically consistent with the ground truth, demonstrating the robustness and versatility of the proposed method.

Table 2. Quantitative comparison of segmentation results of different segmentation methods on the BigNeuron test dataset.

Models	Categories	Test results
Models	Categories	Recall↑	Precision↑	IoU↑
3D U-Net	Background	0.9952	0.9988	0.9941
3D U-Net	Target neuron	0.7611	0.4329	0.3812

3D Wave-Net	Background	0.9959	0.9986	0.9946
3D Wave-Net	Target neuron	0.7176	0.4585	0.3884

ADTL-Net	Background	0.9849	0.9997	0.9846
ADTL-Net	Target neuron	0.9425	0.2281	0.2250

ER-Net	Background	0.9949	0.9986	0.9935
ER-Net	Target neuron	0.7178	0.4001	0.3457

SwinUNETR	Background	0.9964	0.9984	0.9949
SwinUNETR	Target neuron	0.6825	0.4731	0.3878

3D UX-Net	Background	0.9918	0.9992	0.9911
3D UX-Net	Target neuron	0.8373	0.3270	0.3075

Ours	Background	0.9968	0.9985	0.9954
Ours	Target neuron	0.6981	0.5118	0.4190

Note: The bold values represent the best results for the target neuron and background, respectively.
↑ indicates that the larger the results of Recall, Precision, and IoU, the better.

Table 3. Quantitative comparison of the reconstruction performance of different segmentation methods on the BigNeuron dataset.

Models	ESA↓	DSA↓	ADS↓
3D U-Net	4.9546	10.1750	0.2584
3D Wave-Net	9.0442	15.6249	0.3361
ADTL-Net	8.3560	14.5661	0.3393
ER-Net	6.6785	11.8864	0.2735
3D UX-Net	4.3721	5.9841	0.6759
SwinUNETR	15.9690	21.3220	0.3538
Ours	2.4356	6.9410	0.2120

Note: The bold values indicate that this method performs the best among all methods in each column.
↓ indicates that the smaller the results of ESA, DSA, and ADS, the better the performance of this method.

4.5. Evaluation on Images From the CWMBS Dataset

To demonstrate the advantages of the proposed method in processing more challenging images of different categories, we conducted additional experiments using the CWMBS dataset. In this experiment, two groups of images with significant feature differences, namely, strong noise images and weak signal images, were selected to test and validate our method. As shown in Table 4, compared with six other state-of-the-art segmentation methods, the proposed method achieved the best performance in terms of recall, precision, and IoU. The qualitative visualization results of neuron segmentation are shown in Figure 10. By comparing (a1) with (a2) and (b1) with (b2), it is clear that 3D FreqSNet effectively removes noise and enhances the faint filament structures. We also visualized the reconstruction results using the APP2 method on both the original and segmented images, and it is evident that our method makes reconstruction easier and more accurate.

Table 4. Quantitative comparison of segmentation results of different segmentation methods on the CWMBS test dataset.

Models	Categories	Test results
Models	Categories	Recall↑	Precision↑	IoU↑
3D U-Net	Background	0.9990	0.9992	0.9982
3D U-Net	Target neuron	0.2342	0.1904	0.1173

3D Wave-Net	Background	0.9996	0.9991	0.9989
3D Wave-Net	Target neuron	0.1185	0.3355	0.0957

ADTL-Net	Background	0.9992	0.9992	0.9984
ADTL-Net	Target neuron	0.2319	0.2321	0.1312

ER-Net	Background	0.9981	0.9997	0.9979
ER-Net	Target neuron	0.7879	0.2945	0.2728

SwinUNETR	Background	0.9994	0.9995	0.9989
SwinUNETR	Target neuron	0.4936	0.4730	0.3185

3D UX-Net	Background	0.9993	0.9995	0.9989
3D UX-Net	Target neuron	0.5548	0.4506	0.3309

Ours	Background	0.9993	0.9996	0.9989
Ours	Target neuron	0.5985	0.4747	0.3601

Note: Each column in bold represents the best results for the target neuron and background, respectively.
↑ indicates that the larger the results of Recall, Precision, and IoU, the better.

The comparison of reconstruction results using different segmentation methods is shown in Figure 11. Previous state-of-the-art methods such as 3D U-Net, ER-Net, and ADTL-Net produce discontinuous segmentations, leading to incomplete reconstructed structures (b2, b3, and b4 in Figure 11). SwinUNETR oversegments noisy images, resulting in unnecessary scatter (b5 in Figure 11). In contrast, 3D FreqSNet achieves better reconstruction results, further demonstrating that our method greatly benefits noise reduction and enhancement of neuron structure signals.

4.6. Comparison of Different Reconstruction Algorithms

We further quantitatively evaluated the impact of 3D FreqSNet on different neuron reconstruction algorithms. Comparisons of other segmentation methods were conducted on five available neuron reconstruction algorithms, namely, APP1 [16], APP2 [2], NeuroGPS-Tree (GPS) [37], MST_Tracing (MST) [38], and Snake [39]. The box plot of the quantitative analysis is shown in Figures 12(a), 12(b), and 12(c). Compared with reconstructing neuron structures directly from images without any preprocessing, the segmentation maps generated by the proposed model showed significant improvements in various reconstruction metrics. It can also be observed that after applying the five reconstruction algorithms, 3D FreqSNet outperformed other segmentation methods used for comparison in most neuron reconstruction measurement metrics. Due to the loss of a large number of nerve fibers in the segmentation results of 3D Wave-Net (M2) and ADTL-Net (M4), the reconstruction results were worse than those obtained on the original images. Similarly, as shown in Figures 12(d), 12(e), and 12(f), 3D FreqSNet significantly improved the performance of the five reconstruction algorithms on CWMBS test images, outperforming the reconstruction results on the original images in various quantitative metrics. Compared with the M1–M6 segmentation methods, the reconstruction metrics performed better. Overall, our proposed method can extract more complete neuron structures when processing complex images, which greatly simplifies the subsequent neuron tracing process and further improves the reconstruction performance of the BigNeuron and CWMBS datasets.

4.7. Computational Complexity Analysis

In a clinical setting, computational complexity is an important metric. The number of model parameters and floating point operations per second (FLOPs) are used to evaluate the computational complexity of different methods. The number of parameters in our model is 6.24 M, and the floating-point operations (FLOPs) required are 143.25 G. As shown in Figure 13, SwinUNETR (M5) and 3D UX-Net (M7) have a large number of parameters and high computational costs due to the quadratic complexity of their self-attention mechanisms, which limit their application in large-scale neuronal reconstruction. Methods such as 3D U-Net (M1), 3D Wave-Net (M2), ER-Net (M3), and ADTL-Net (M4) have relatively smaller model parameter counts and floating-point operation counts, but their neuronal reconstruction performance is not high. As can be seen from Figure 13, Tables 3 and 5, our method achieves a better balance between accuracy and computational cost, making it more suitable for clinical experiments and the reconstruction of very large-scale neurons.

Table 5. Quantitative comparison of the reconstruction performance of different segmentation methods on the CWMBS dataset.

Category	Method	ESA↓	DSA↓	ADS↓
Strong noise	3D U-Net	15.0077	19.8790	0.4669
	3D Wave-Net	24.3382	29.0396	0.4943
	ER-NET	22.3417	27.1394	0.4831
	ADTL-Net	12.6981	17.2158	0.4778
	3D UX-Net	5.3132	13.0049	0.2672
	SwinUNETR	6.8669	13.0615	0.2987
	Ours	4.7137	12.7418	0.2413

Weak signal	3D U-Net	24.7178	29.0057	0.4561
	3D Wave-Net	36.7628	40.7563	0.5590
	ER-NET	32.2134	41.8694	0.4647
	ADTL-Net	23.3912	30.7686	0.5375
	3D UX-Net	9.1634	16.6115	0.2569
	SwinUNETR	13.2005	19.1474	0.3464
	Ours	6.3835	13.0442	0.2126

Note: The bold values indicate that this method performs the best among all methods in each column.
↓ indicates that the smaller the results of ESA, DSA, and ADS, the better the performance of this method.

4.8. Ablation Experiments

4.8.1. Ablation Experiments With 3D FreqSNet

In this method, we proposed the FSRA, MFAM, and FSM. To demonstrate the effectiveness of these modules, comprehensive ablation experiments were conducted on both the BigNeuron dataset and the CWMBS dataset. As shown in Table 6, the APP2 reconstruction method was used to measure 17 images in BigNeuron and 40 images in CWMBS, respectively. In addition, we also calculated the average inference time for the BigNeuron test images associated with each variant. Firstly, to verify the effectiveness of the backbone network, we replaced the original blocks in 3D U-Net with ResBlock and denoted it as ResU-Net. By comparing Tables 3 and 6, the results indicated that ResU-Net improved the ESA, DSA, and ADS reconstruction results by 1.1002, 1.3456, and 0.0363, respectively, on the BigNeuron dataset. Furthermore, by comparing Tables 5 and 6, the ESA, DSA, and ADS of 3D U-Net on the CWMBS dataset are 19.8627, 24.4423, and 0.4615, respectively, while those of ResU-Net are 10.1511, 15.9947, and 0.3368, respectively, which further demonstrates the effectiveness of the backbone network. Then, we conducted ablation experiments on the proposed modules based on ResU-Net (Net1).

Table 6. Ablation experiment results for different modules.

Design	Backbone	Modules			Parm (M)	Inference time (S)	BigNeuron			CWMBS
Design	ResU-Net	FSAB	MFAM	FSM	Parm (M)	Inference time (S)	ESA↓	DSA↓	ADS↓	ESA↓	DSA↓	ADS↓
Net1	√				7.34	7.44	3.8554	8.8294	0.2220	10.1511	15.9947	0.3368
Net2	√	√			6.42	7.11	3.2351	7.8325	0.2132	8.0932	15.7590	0.2563
Net3	√	√		√	5.28	11.67	2.8864	7.5045	0.2199	7.9587	15.4347	0.2560
Net4	√	√	√		6.82	18.23	2.7337	7.0147	0.2103	6.4211	14.5223	0.2358
Net5	√	√	√	√	6.24	15.23	2.5048	6.9185	0.2095	5.5486	12.8930	0.2269

Note: The bold values indicate that this method performs the best among all methods in each column.
↓ indicates that the smaller the results of ESA, DSA, and ADS, the better the performance of this method.

Net2 replaces the ResBlock in Net1 with the FSRA. On the BigNeuron and CWMBS datasets, the neuron reconstruction metrics ESA improved by 0.6203 and 2.0579, respectively, DSA improved by 0.9969 and 0.2357, respectively, and ADS improved by 0.0088 and 0.0805, respectively. Compared to only retaining the spatial domain (Net1), introducing the FFT (Net2) significantly improves performance, indicating that the model can capture richer global information in the frequency domain, which is beneficial for improving neuron structure reconstruction performance. The FSRA module enhances the network’s learning capability by simultaneously learning information from both the spatial and frequency domains, thereby reducing noise interference and better capturing texture shapes.

To demonstrate the effectiveness of the MFAM, after removing the MFAM module from our model, the ESA of the two datasets decreased by 0.3816 and 2.4101, respectively, DSA decreased by 0.586 and 2.5417, respectively, and ADS decreased by 0.0104 and 0.0291, respectively, due to the lack of the ability to utilize cross-layer contextual features.

Similarly, if the FSM is not used, the model’s ability to adaptively select important features will be greatly weakened, which is also clearly reflected in the reconstruction metrics of the two datasets. The ESA decreased by 0.2289 and 1.6293, respectively, DSA decreased by 0.0962 and 1.6293, respectively, and ADS decreased by 0.8725 and 0.0089, respectively. In addition, the model parameters and inference time also increased slightly because the lack of FSM increases redundant feature calculations and reduces inference efficiency.

The outstanding performance of 3D FreqSNet is attributed to the synergistic effect of FSRA, MFAM, and FSM, which together enable the model to effectively capture complex spatial and frequency dependencies and accurately and robustly extract valuable features necessary for neuron image segmentation.

4.8.2. Ablation Experiments on the FSRA Module

To verify the impact of the FFT and spatial domain convolutional branches on the internal performance of FSAB, we conducted ablation experiments on FSAB, as shown in Table 7. Firstly, after removing the FFT and 3 × 3 × 3 convolutional block (replaced with Conv1 × 1 × 1), the ESA, DSA, and ADS of the BigNeuron dataset were 3.3526, 8.6201, and 0.2446, respectively, while those of the CWMBS dataset were 11.6143, 21.0707, and 0.2998, respectively. Compared to retaining the FFT and Conv3 × 3 × 3, performance decreased, indicating that ordinary convolutions are difficult to fully capture global information. Subsequently, compared to the case of not using both FFT and Conv3 × 3 × 3, introducing only spatial domain convolutions did not significantly improve the ESA, DSA, and ADS values. In contrast, introducing only the FFT significantly improved performance, indicating its ability to capture rich global information in the frequency domain. Finally, when both FFT and Conv3 × 3 × 3 were used, the ESA, DSA, and ADS of the BigNeuron dataset were 2.5048, 6.9185, and 0.2095, respectively, while those of the CWMBS dataset were 5.5486, 12.8930, and 0.2269, respectively. This demonstrates that combining FFT and Conv3 × 3 × 3 can effectively utilize the complementary advantages of the frequency and spatial domains, achieving excellent segmentation and reconstruction performance.

Table 7. Results of ablation experiments of the FSRA module.

FFT	Spatial branch	BigNeuron			CWMBS
FFT	Spatial branch	ESA	DSA	ADS	ESA	DSA	ADS
—	—	3.3526	8.6201	0.2446	11.6143	21.0707	0.2998
—	√	3.9955	8.5949	0.2420	11.8177	17.5911	0.3517
√	—	2.8876	7.1483	0.2195	6.8812	13.4469	0.2723
√	√	2.5048	6.9185	0.2095	5.5486	12.8930	0.2269

Note: The bold values indicate that this method performs the best among all methods in each column.

4.8.3. Visualization of Ablation Experiments

Figure 14 intuitively visualizes the reconstruction of an example neuron image. Figure 14(a) shows the original image, and Figure 14(b) shows the manually reconstructed neuron, which accurately depicts the morphological structure of the neuron. Figure 14(c) displays the neuron reconstruction result of APP2 on the original image, while Figure 14(d) shows the APP2 reconstruction result based on the segmentation map of ResU-Net. The following three images are the APP2 reconstruction results of segmented neuron images under different ablation modules. Figure 14(h) presents the APP2 reconstruction on the neurons segmented by 3D FreqSNet. On the original image, APP2 is not only susceptible to noise interference, overtracking noise as nerve fibers, but also stops tracking weak-signal neurons. By comparing Figures 14(d), 14(e), 14(f), 14(g), and 14(h), it can be observed that after adding FSRA, MFAM, and FSM, respectively, the neuron reconstruction performance improves progressively, further demonstrating the effectiveness of these modules. As indicated by the yellow arrows, 3D FreqSNet captures richer texture details and achieves more complete reconstructions, illustrating its feasibility in suppressing noise and enhancing weak neuronal fiber structures.

5. Conclusion

In this study, we proposed a frequency and spatial domain integrated encoder–decoder 3D segmentation network, called 3D FreqSNet, which mainly consists of three modules: the FSRA, the MAFM, and the FSM. Firstly, the FSRA extracts global and local information from frequency and spatial domain feature maps, effectively integrating frequency spatial information and reducing the network’s susceptibility to noise. Secondly, the MFAM enhances multiscale contextual perception capabilities by integrating and fusing spatial and channel positional information. Finally, the FSM avoids feature redundancy by adaptively selecting features from the encoder and decoder at the same hierarchy level. We conducted extensive experiments on two complex 3D neuron image datasets, and the results show that the proposed model not only significantly improves the reconstruction results of five existing neuron tracing algorithms but also demonstrates clear advantages compared to other advanced neuron segmentation methods in improving neuron reconstruction performance. At the same time, it exhibits higher accuracy in segmenting neuron images with strong background noise and weak voxel signals. In addition, 3D FreqSNet achieves a good balance between accuracy and computational cost, which is more conducive to large-scale neuron reconstruction.

However, this study still has some limitations. Firstly, the threshold α of the proposed loss function needs to be manually adjusted according to the dataset, which requires subjective experience. To address this, we will develop an automatic method to achieve adaptive setting of α in the future. Secondly, the proposed segmentation algorithm is based on fully supervised learning, requiring corresponding manually reconstructed annotation data. However, large-scale manual annotation of neurons demands extensive neurobiology knowledge and human resources, making it difficult to obtain. To reduce the need for a large amount of annotated data, we will explore the use of semisupervised, transfer learning, or active learning methods in the future to achieve more positive results. Finally, this study only explores the reconstruction of neurons on two animal brain slices. In the future, we will further explore the reconstruction of ultra-large–scale neuronal clusters on whole-brain images of animals, which will contribute to the analysis of brain neural networks and brain functional mechanisms, thereby promoting further research on brain cognition, brain diseases, and other aspects.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Haixing Song: writing–review and editing, writing–original draft, software, methodology, formal analysis, and conceptualization. Xuqing Zeng: writing–review and editing, visualization, and validation. Guanglian Li: supervision, resources, and formal analysis. Rongqing Wu: methodology and conceptualization. Simin Liu: resources and data curation. Fuyun He: supervision, resources, project administration, and funding acquisition.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62062014) and the grant from Guangxi Key Laboratory of Brain-inspired Computing and Intelligent Chips (No. BCIC-23-Z1).

Open Research

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

1 Meijering E., Neuron Tracing in Perspective, Cytometry, Part A. (2010) 77, no. 7, 693–704, https://doi.org/10.1002/cyto.a.20895, 2-s2.0-77954289614.
10.1002/cyto.a.20895
PubMed Web of Science® Google Scholar
2 Xiao H. and Peng H., App2: Automatic Tracing of 3d Neuron Morphology Based on Hierarchical Pruning of a Gray-Weighted Image Distance-Tree, Bioinformatics. (2013) 29, no. 11, 1448–1454, https://doi.org/10.1093/bioinformatics/btt170, 2-s2.0-84878276723.
10.1093/bioinformatics/btt170
CAS PubMed Web of Science® Google Scholar
3 Li A., Gong H., Zhang B. et al., Micro-Optical Sectioning Tomography to Obtain a High-Resolution Atlas of the Mouse Brain, Science. (2010) 330, no. 6009, 1404–1408, https://doi.org/10.1126/science.1191776, 2-s2.0-78649694219.
10.1126/science.1191776
CAS PubMed Web of Science® Google Scholar
4 Ming X., Li A., Wu J. et al., Rapid Reconstruction of 3d Neuronal Morphology from Light Microscopy Images With Augmented Rayburst Sampling, PLoS One. (2013) 8, no. 12, https://doi.org/10.1371/journal.pone.0084557, 2-s2.0-84894116906.
10.1371/journal.pone.0084557
Web of Science® Google Scholar
5 Yang B., Ying L., and Tang J., Artificial Neural Network Enhanced Bayesian Pet Image Reconstruction, IEEE Transactions on Medical Imaging. (2018) 37, no. 6, 1297–1309, https://doi.org/10.1109/tmi.2018.2803681, 2-s2.0-85041524393.
10.1109/TMI.2018.2803681
PubMed Web of Science® Google Scholar
6 Yao R., Seidel J., Johnson C. A., Daube-Witherspoon M. E., Green M. V., and Carson R. E., Performance Characteristics of the 3-d Osem Algorithm in the Reconstruction of Small Animal Pet Images, IEEE Transactions on Medical Imaging. (2000) 19, no. 8, 798–804, https://doi.org/10.1109/42.876305, 2-s2.0-0034240690.
10.1109/42.876305
CAS PubMed Web of Science® Google Scholar
7 Liu S., Zhang D., Liu S., Feng D., Peng H., and Cai W., Rivulet: 3d Neuron Morphology Tracing With Iterative Back-Tracking, Neuroinformatics. (2016) 14, no. 4, 387–401, https://doi.org/10.1007/s12021-016-9302-0, 2-s2.0-84968586421.
10.1007/s12021-016-9302-0
PubMed Web of Science® Google Scholar
8 Al-Amri S. S., Kalyankar N. V., and Khamitkar S.D., Image Segmentation by Using Threshold Techniques, 2010, https://arxiv.org/abs/1005.4020.
Google Scholar
9 Basu S., Aksel A., Condron B., and Acton S. T., Tree2tree: Neuron Segmentation for Generation of Neuronal Morphology, 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2010, IEEE, 548–551.
10.1109/ISBI.2010.5490289
Google Scholar
10 Santamaría-Pang A., Hernandez-Herrera P., Papadakis M., Saggau P., and Kakadiaris I. A., Automatic Morphological Reconstruction of Neurons From Multiphoton and Confocal Microscopy Images Using 3d Tubular Models, Neuroinformatics. (2015) 13, no. 3, 297–320, https://doi.org/10.1007/s12021-014-9253-2, 2-s2.0-84931006013.
10.1007/s12021-014-9253-2
PubMed Web of Science® Google Scholar
11 Gu L., Zhang X., Zhao H., Li H., and Cheng L., Segment 2d and 3d Filaments by Learning Structured and Contextual Features, IEEE Transactions on Medical Imaging. (2017) 36, no. 2, 596–606, https://doi.org/10.1109/tmi.2016.2623357, 2-s2.0-85012260177.
10.1109/TMI.2016.2623357
PubMed Web of Science® Google Scholar
12 Çiçek Ö., Abdulkadir A., Lienkamp S. S., Brox T., and Ronneberger O., 3d U-Net: Learning Dense Volumetric Segmentation From Sparse Annotation, Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, October 2016, Athens, Greece, Springer, 424–432.
Google Scholar
13 Liu M., Luo H., Tan Y., Wang X., and Chen W., Improved V-Net Based Image Segmentation for 3d Neuron Reconstruction, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2018, IEEE, 443–448.
Google Scholar
14 Li Q. and Shen L., Neuron Segmentation Using 3d Wavelet Integrated Encoder–Decoder Network, Bioinformatics. (2022) 38, no. 3, 809–817, https://doi.org/10.1093/bioinformatics/btab716.
10.1093/bioinformatics/btab716
PubMed Web of Science® Google Scholar
15 Todorov M. I., Paetzold J. C., Schoppe O. et al., Automated Analysis of Whole Brain Vasculature Using Machine Learning, bioRxiv. (2019) 613257.
Google Scholar
16 Peng H., Long F., and Myers G., Automatic 3d Neuron Tracing Using All-Path Pruning, Bioinformatics. (2011) 27, no. 13, i239–i247, https://doi.org/10.1093/bioinformatics/btr237, 2-s2.0-79959470488.
10.1093/bioinformatics/btr237
CAS PubMed Web of Science® Google Scholar
17 Zhou Z., Liu X., Long B., and Peng H., Tremap: Automatic 3d Neuron Reconstruction Based on Tracing, Reverse Mapping and Assembling of 2d Projections, Neuroinformatics. (2016) 14, no. 1, 41–50, https://doi.org/10.1007/s12021-015-9278-1, 2-s2.0-84953840140.
10.1007/s12021-015-9278-1
PubMed Web of Science® Google Scholar
18 Chen H., Xiao H., Liu T., and Peng H., Smarttracing: Self-Learning-Based Neuron Reconstruction, Brain informatics. (2015) 2, no. 3, 135–144, https://doi.org/10.1007/s40708-015-0018-y, 2-s2.0-85048519610.
10.1007/s40708-015-0018-y
PubMed Google Scholar
19 Zhou Z., Kuo H. C., Peng H., and Long F., Deepneuron: An Open Deep Learning Toolbox for Neuron Tracing, Brain informatics. (2018) 5, no. 2, 3–9, https://doi.org/10.1186/s40708-018-0081-2, 2-s2.0-85048284269.
10.1186/s40708-018-0081-2
PubMed Google Scholar
20 Peng H., Hawrylycz M., Roskams J. et al., Bigneuron: Large-Scale 3d Neuron Reconstruction From Optical Microscopy Images, Neuron. (2015) 87, no. 2, 252–256, https://doi.org/10.1016/j.neuron.2015.06.036, 2-s2.0-84937422559.
10.1016/j.neuron.2015.06.036
CAS PubMed Web of Science® Google Scholar
21 Peng H., Ruan Z., Long F., Simpson J. H., and Myers E. W., V3d Enables Real-Time 3d Visualization and Quantitative Analysis of Large-Scale Biological Image Data Sets, Nature Biotechnology. (2010) 28, no. 4, 348–353, https://doi.org/10.1038/nbt.1612, 2-s2.0-77950679704.
10.1038/nbt.1612
CAS PubMed Web of Science® Google Scholar
22 Liu S., Zhang D., Song Y., Peng H., and Cai W., Triple-Crossing 2.5 D Convolutional Neural Network for Detecting Neuronal Arbours in 3d Microscopic Images, Machine Learning in Medical Imaging: 8th International Workshop, MLMI 2017, Held in Conjunction With MICCAI 2017, Quebec City, QC, Canada, September 10, 2017, Proceedings 8, 2017, Springer, 185–193.
10.1007/978-3-319-67389-9_22
Google Scholar
23 Li R., Zeng T., Peng H., and Ji S., Deep Learning Segmentation of Optical Microscopy Images Improves 3-d Neuron Reconstruction, IEEE Transactions on Medical Imaging. (2017) 36, no. 7, 1533–1541, https://doi.org/10.1109/tmi.2017.2679713, 2-s2.0-85028365655.
10.1109/TMI.2017.2679713
PubMed Web of Science® Google Scholar
24 Yang B., Chen W., Luo H., Tan Y., Liu M., and Wang Y., Neuron Image Segmentation via Learning Deep Features and Enhancing Weak Neuronal Structures, IEEE Journal of Biomedical and Health Informatics. (2021) 25, no. 5, 1634–1645, https://doi.org/10.1109/jbhi.2020.3017540.
10.1109/JBHI.2020.3017540
PubMed Web of Science® Google Scholar
25 Xia L., Zhang H., Wu Y. et al., 3d Vessel-Like Structure Segmentation in Medical Images by an Edge-Reinforced Network, Medical Image Analysis. (2022) 82, https://doi.org/10.1016/j.media.2022.102581.
10.1016/j.media.2022.102581
Web of Science® Google Scholar
26 Liu M., Wu S., Chen R., Lin Z., Wang Y., and Meijering E., Brain Image Segmentation for Ultrascale Neuron Reconstruction via an Adaptive Dual-Task Learning Network, IEEE Transactions on Medical Imaging. (2024) 43, no. 7, 2574–2586, https://doi.org/10.1109/tmi.2024.3367384.
10.1109/TMI.2024.3367384
PubMed Web of Science® Google Scholar
27 Zhong Y., Li B., Tang L., Kuang S., Wu S., and Ding S., Detecting Camouflaged Object in Frequency Domain, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 4504–4513.
Google Scholar
28 Rao Y., Zhao W., Zhu Z., Zhou J., and Lu J., Gfnet: Global Filter Networks for Visual Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2023) 45, no. 9, 10960–10973, https://doi.org/10.1109/tpami.2023.3263824.
10.1109/TPAMI.2023.3263824
PubMed Web of Science® Google Scholar
29 Neil H. and Dirk W., Transformers for Image Recognition at Scale, 2020, https://ai.googleblog.com/2020/12/transformersforimagerecognitionat.html.
Google Scholar
30 Hatamizadeh A., Nath V., Tang Y., Yang D., Roth H. R., and Xu D., Swin Unetr: Swin Transformers for Semantic Segmentation of Brain Tumors in Mri Images, International MICCAI Brainlesion Workshop, 2021, Springer, 272–284.
Google Scholar
31 Lee H. H., Bao S., Huo Y., and Landman B. A., 3d Ux-Net: A Large Kernel Volumetric Convnet Modernizing Hierarchical Transformer for Medical Image Segmentation, arXiv preprint arXiv:2209.15076. (2022) .
Google Scholar
32 He K., Zhang X., Ren S., and Sun J., Deep Residual Learning for Image Recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 770–778.
Google Scholar
33 Said E., Fahmy G. F., Nassar D., and Ammar H., Dental X-Ray Image Segmentation, Biometric Technology for Human Identification, 2004, SPIE, 409–417.
10.1117/12.541658
Google Scholar
34 Chen L. C., Papandreou G., Schroff F., and Adam H., Rethinking Atrous Convolution for Semantic Image Segmentation, Arxiv. arXiv Preprint arXiv:1706.05587 5. (2017) .
Google Scholar
35 Wang H., Zhang D., Song Y. et al., Multiscale Kernels for Enhanced U-Shaped Network to Improve 3d Neuron Tracing, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.
Google Scholar
36 Peng H., Ruan Z., Atasoy D., and Sternson S., Automatic Reconstruction of 3d Neuron Structures Using a Graph-Augmented Deformable Model, Bioinformatics. (2010) 26, no. 12, i38–i46, https://doi.org/10.1093/bioinformatics/btq212, 2-s2.0-77954196544.
10.1093/bioinformatics/btq212
CAS PubMed Web of Science® Google Scholar
37 Quan T., Zhou H., Li J. et al., Neurogps-Tree: Automatic Reconstruction of Large-Scale Neuronal Populations with Dense Neurites, Nature Methods. (2016) 13, no. 1, 51–54, https://doi.org/10.1038/nmeth.3662, 2-s2.0-84955636184.
10.1038/nmeth.3662
CAS PubMed Web of Science® Google Scholar
38 Yang J., Hao M., Liu X., Wan Z., Zhong N., and Peng H., Fmst: An Automatic Neuron Tracing Method Based on Fast Marching and Minimum Spanning Tree, Neuroinformatics. (2019) 17, no. 2, 185–196, https://doi.org/10.1007/s12021-018-9392-y, 2-s2.0-85050544661.
10.1007/s12021-018-9392-y
PubMed Web of Science® Google Scholar
39 Wang Y., Narayanaswamy A., Tsai C. L., and Roysam B., A Broadly Applicable 3-d Neuron Tracing Method Based on Open-Curve Snake, Neuroinformatics. (2011) 9, no. 2-3, 193–217, https://doi.org/10.1007/s12021-011-9110-5, 2-s2.0-79958104510.
10.1007/s12021-011-9110-5
PubMed Web of Science® Google Scholar

All articles

Neuron Segmentation via a Frequency and Spatial Domain–Integrated Encoder–Decoder Network

Abstract

1. Introduction

2. Related Works

2.1. 3D Neuronal Reconstruction

2.2. Deep Learning–Based Neuron Segmentation Methods

3. Methods

3.1. The Method Pipeline

3.2. Overall Structure

3.3. Frequency Space Representation Aggregation Block

3.4. MAFM

3.5. FSM

3.6. Loss Function

4. Experiment and Result

4.1. Datasets

4.2. Implementation Details

4.3. Evaluation Metrics

4.4. Evaluation on Images From the BigNeuron Dataset

4.5. Evaluation on Images From the CWMBS Dataset

4.6. Comparison of Different Reconstruction Algorithms

4.7. Computational Complexity Analysis

4.8. Ablation Experiments

4.8.1. Ablation Experiments With 3D FreqSNet

4.8.2. Ablation Experiments on the FSRA Module

4.8.3. Visualization of Ablation Experiments

5. Conclusion

Conflicts of Interest

Author Contributions

Funding

Open Research

Data Availability Statement

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley