Volume 2025, Issue 1 7026120
Research Article
Open Access

Neuron Segmentation via a Frequency and Spatial Domain–Integrated Encoder–Decoder Network

Haixing Song

Haixing Song

Guangxi Key Laboratory of Brain-Inspired Computing and Intelligent Chips , Department of Artificial Intelligence , School of Electronic and Information Engineering , Guangxi Normal University , Guilin , China , gxnu.edu.cn

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author
Xuqing Zeng

Xuqing Zeng

Guangxi Key Laboratory of Brain-Inspired Computing and Intelligent Chips , Department of Artificial Intelligence , School of Electronic and Information Engineering , Guangxi Normal University , Guilin , China , gxnu.edu.cn

Search for more papers by this author
Guanglian Li

Guanglian Li

Guangxi Key Laboratory of Brain-Inspired Computing and Intelligent Chips , Department of Artificial Intelligence , School of Electronic and Information Engineering , Guangxi Normal University , Guilin , China , gxnu.edu.cn

Search for more papers by this author
Rongqing Wu

Rongqing Wu

Guangxi Key Laboratory of Brain-Inspired Computing and Intelligent Chips , Department of Artificial Intelligence , School of Electronic and Information Engineering , Guangxi Normal University , Guilin , China , gxnu.edu.cn

Search for more papers by this author
Simin Liu

Simin Liu

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author
Fuyun He

Corresponding Author

Fuyun He

Guangxi Key Laboratory of Brain-Inspired Computing and Intelligent Chips , Department of Artificial Intelligence , School of Electronic and Information Engineering , Guangxi Normal University , Guilin , China , gxnu.edu.cn

Key Laboratory of Nonlinear Circuit and Optical Communication (Guangxi Normal University) , Education Department of Guangxi Zhuang Autonomous Region , Guilin , China

Search for more papers by this author
First published: 17 February 2025
Academic Editor: Eugenio Vocaturo

Abstract

Three-dimensional (3D) segmentation of neurons is a crucial step in the digital reconstruction of neurons and serves as an important foundation for brain science research. In neuron segmentation, the U-Net and its variants have showed promising results. However, due to their primary focus on learning spatial domain features, these methods overlook the abundant global information in the frequency domain. Furthermore, issues such as insufficient processing of contextual features by skip connections and redundant features resulting from simple channel concatenation in the decoder lead to limitations in accurately segmenting neuronal fiber structures. To address these problems, we propose an encoder–decoder segmentation network integrating frequency domain and spatial domain to enhance neuron reconstruction. To simplify the segmentation task, we first divide the neuron images into neuronal cubes. Then, we design 3D FregSNet, which leverages both frequency and spatial domain features to segment the target neurons within these cubes. Then, we introduce a multiscale attention fusion module (MAFM) that utilizes spatial and channel position information to enhance contextual feature representation. In addition, a feature selection module (FSM) is incorporated to adaptively select discriminative features from both the encoder and decoder, increasing the weight on critical neuron locations and significantly improving segmentation performance. Finally, the segmented nerve fiber cubes were assembled into complete neurons and digitally reconstructed using available neuron tracking algorithms. In experiments, we evaluated 3D FregSNet on two challenging 3D neuron image datasets (the BigNeuron dataset and the CWMBS dataset). Compared to other advanced segmentation methods, 3D FregSNet demonstrates more accurate extraction of target neurons in noisy and weakly visible neuronal fiber images, effectively improving the performance of 3D neuron segmentation and reconstruction.

1. Introduction

Neuronal morphology plays a significant role in the field of neuroscience. 3D digital reconstruction of neurons aims to establish tree-like models of 3D neuronal morphology using optical microscopy images, which is crucial for understanding the mechanisms of the nervous system and the functions of the brain [1]. However, due to the limitations of current imaging technologies, the quality of the generated 3D brain neuron images varies significantly. As shown in Figure 1, intense background noise and weak neuron signals pose significant challenges to the existing automatic or semiautomatic brain neuron reconstruction algorithms [27], making it difficult to achieve accurate neuron reconstruction results. Neuron segmentation, as a preprocessing step, can effectively and automatically denoise and enhance weak signals, making the entire tracking process easier and more precise. In the early stages, some traditional neuron segmentation algorithms were adopted, such as threshold segmentation [8], maximum likelihood global trees [9], 3D tubular models [10], and data-driven methods that encode context-tagged information [11]. However, these methods often require parameter tuning and generally only work well for images with high contrast and clear backgrounds, rendering them a challenge to apply to intricate neuron images. In recent years, deep learning–based methods have gained significant traction in the field of neuron image segmentation. Among them, convolutional neural networks (CNNs) are particularly representative, with 3D U-Net [12] and its variants [1315] being the most notable examples. However, these advanced algorithms still struggle to precisely extract neuron information from challenging images. First, current deep learning methods typically only extract semantic features in the spatial domain, exhibiting limitations in learning global contextual information. Frequency domain features encapsulate crucial global information about the image. Combining these frequency domain features with spatial domain characteristics can yield a richer, more comprehensive feature representation, thereby enhancing the accuracy of neuron segmentation. Then, by introducing skip connections, the U-Net network can effectively merge low-level and high-level image features. However, this native approach and the constraints of the receptive field can easily result in insufficient processing of multiscale contextual features. Furthermore, at the end of the skip connections, where the feature maps of the decoder and the symmetrical encoder are concatenated and combined, using simple channel concatenation for feature fusion can lead to issues such as feature redundancy and excessive computational overhead.

Details are in the caption following the image
Examples of volume microscopy images and neuron reconstruction results. (a) Original human neuron image with strong background noise and weak filament signals. (b) The manual reconstruction of the neuron. (c) Reconstruction results of the APP2 [2] tracing method, which misses many filaments with weak signals. (d) Reconstruction results of the MOST [3] tracing method, which traces many redundant neuronal structures.
To address the issues mentioned above, we propose a novel 3D segmentation network designed to leverage frequency and spatial features for processing image blocks with varying signal and noise intensities, thereby efficiently and accurately extracting neuronal structures for neuron reconstruction. Specifically, to effectively integrate frequency and spatial features and extract rich frequency-spatial information, we propose the frequency-spatial representation aggregation block (FSRA) in the encoder stage of the network. The FSRA block employs 3D fast Fourier transform (FFT) and 3D convolution to fully integrate frequency and spatial domain information. Meanwhile, in the skip connection stage of the network, we propose the multiscale attention fusion module (MAFM), which can better enhance the network’s ability to learn multiscale contexts and address the imbalance in feature map distribution within the 3D space. Furthermore, since not all features are effectively applicable to extract neuronal structures, we utilize a feature selection module (FSM) to recover features from the decoder and adaptively select effective features from the encoded features of the encoder, thereby capturing more details of neuronal structures. Finally, a mixed loss of Dice and cross-entropy is adopted to address the issue of positive–negative imbalance in neuron image samples. In summary, the main contributions of this work are as follows:
  • 1.

    We proposed a neuron segmentation method based on an encoder–decoder network that utilizes both frequency and spatial domains, effectively combining frequency and spatial information for the segmentation of large-scale neuronal images.

  • 2.

    We proposed the FSRA, which performs feature learning in both the frequency and spatial domains, preserving the ability to learn global information while maintaining the integrity of local information.

  • 3.

    We proposed the MAFM and FSM. The former enhances contextual feature representation by utilizing spatial and channel position information, while the latter adaptively selects features that significantly increase the weight of key neuronal voxels to avoid feature redundancy.

  • 4.

    Extensive experiments on two 3D neuronal datasets demonstrated that our segmentation results achieve state-of-the-art performance across five different neuron reconstruction algorithms, striking a balance between computational cost and reconstruction performance.

2. Related Works

2.1. 3D Neuronal Reconstruction

Neuronal morphology reconstruction, also known as neuronal tracing, aims to extract quantitative data on neuronal morphology from 3D microscopic images of neurons and establish a 3D morphological structure model of neurons. This process involves identifying all nodes of neural fibers, establishing the topological structure between these points, and measuring the radius information of all nodes. In recent years, numerous experts and scholars in neuron-related fields have developed various automatic or semiautomatic neuronal tracing techniques. For instance, Peng et al. developed APP [16] and APP2 [2], Zhou et al. introduced TReMAP [17], Chen et al. presented SmartTracing [18], Zhao et al. proposed Tube Models [19], Ming et al. developed MOST [3], and the BigNeuron project [20] compiled dozens of tracing algorithms and integrated them into the Vaa3D software [21]. In general, these algorithms leverage graph theory principles to mathematical model neuronal morphology and neural fibers. However, they face two major challenges in practical applications: firstly, they may erroneously identify background noise as neural fibers; secondly, they may overlook a significant number of genuinely existing neural fibers, as illustrated in Figure 1. Despite these tracing algorithms’ high sensitivity to image quality, they often confront a low signal-to-noise ratio due to limitations in current neuronal imaging technologies. Therefore, there is an urgent need for an automatic and rapid algorithm that can reduce noise influence, enhance weak neuronal structures, and ultimately improve the performance of neuronal reconstruction.

2.2. Deep Learning–Based Neuron Segmentation Methods

Before performing neuron reconstruction, implementing accurate neuron segmentation is an effective strategy to enhance the performance of neuron tracing algorithms. In recent years, deep learning–based methods have demonstrated superior performance in medical image analysis without the need for manual feature extraction. Deep learning has also been introduced into neuron image segmentation. For example, Liu et al. [22] proposed a novel 2.5D framework that utilizes a stack of adjacent 2D slices in a 3D image as the input to train a 2D CNN, effectively mitigating substantial background noise in neuron images. Nevertheless, this approach does not take full advantage of the depth information in the 3D image due to the use of 2D convolutions, potentially leading to inaccurate segmentation results. Li et al. [23] introduced the first 3D residual deep network for neuron segmentation to improve 3D neuron reconstruction performance. Liu et al. [13] further enhanced the neuron structure and removed image noise by improving upon the V-Net architecture. Nevertheless, due to the significant imbalance between foreground and background classes in neuron images, this network fails to effectively capture foreground features. To mitigate the aliasing effects caused by downsampling, Li and Shen [14] designed a 3D wavelet integration network to remove data noise. However, suppressing high-frequency information at each downsampling stage leads to excessive loss of edge details. The anisotropy of neuron images results in incomplete preservation of spatial information, causing the loss of some fine neuronal structures. To address this issue, Yang et al. [24] first proposed a two-stage 3D neuron segmentation method. In the first stage, a FCN is trained to obtain a neuron image segmentation map, which is then used to repair broken structures using a Hessian-repair model based on the segmentation results. However, this repair method is complex and time-consuming, and it cannot effectively repair line segments that are too far apart. To achieve clear edge segmentation in vessel images, Xia et al. [25] enhanced the weights of edge voxels through the reverse edge attention block and edge optimization loss. Liu et al. [26] proposed a novel adaptive learning network that utilizes classification results to guide neuron segmentation, reducing model computational complexity and enabling real-time segmentation. However, the advantages of this method often come at the cost of sacrificing some model accuracy. Furthermore, existing neuron segmentation algorithms primarily focus on acquiring spatial domain information while neglecting the importance of frequency domain information. In general computer vision applications, it has been demonstrated that extracting features in the frequency domain is a powerful approach [27, 28]. For instance, GFNet utilizes 2D discrete Fourier transform to replace the self-attention mechanism in ViT [29], employing global filtering layers in the frequency domain to facilitate information exchange between tokens, thereby more effectively capturing long-range dependencies. However, the frequency domain remains underexplored in 3D semantic segmentation. In the spatial domain, the boundaries between segmentation objects and backgrounds are often blurred, whereas in the frequency domain, objects located at different frequencies can be easily distinguished. Inspired by this, we attempted to leverage frequency domain information to improve the performance of 3D neuron image segmentation tasks.

To provide a concise and to-the-point overview, Table 1 lists the usage methods of related work in a tabular form. This table emphasizes the advantages and disadvantages of each method compared to our current method and indicates whether the method belongs to the frequency domain or spatial domain.

Table 1. Summary of related works.
References Methodology used Advantages Disadvantages Frequency or spatial domain
[22] 2.5D CNN Easy to train and low computational complexity Ignoring 3D spatial information Spatial domain
[23] 3D CNNs The first 3D residual deep network for neuron segmentation Inability to effectively segment complex neuron images Spatial domain
[13] The improved V-Net Using anisotropic convolutional kernels and varying the number of layers to adapt to neuronal datasets The network is unable to capture foreground features efficiently Spatial domain
[14] 3D WaveUNet The first 3D wavelet integrated network Suppressing high-frequency information causes the network to excessively lose edge details Spatial domain
[24] FCN, a ray-shooting model and a Hessian-repair model Effectively applied to challenging datasets contaminated by noise or containing weak filament signals The repair method is complicated, time-consuming, and difficult to apply to line segments with excessively large distances. Spatial domain
[25] ER-Net Being able to effectively extract spatial edge information The overall performance is not high, and the computational load is heavy Spatial domain
[26] ADTL-Net Short inference time Sacrificing model accuracy Spatial domain
[27] Detecting camouflaged object in frequency domain Introducing frequency domain perception cues into CNN models Frequency domain unexplored in 3D Frequency domain
[28] GFNet 2D DFT replaces the self-attention mechanism, which is simple and computationally efficient Frequency domain
[30] SwinUNETR Excellent global information capture capability High computational complexity Spatial domain
[31] 3D UX-Net Large volumetric convolutional kernel enhances capture of spatial information High computational resource requirements and relative complexity of the training process Spatial domain

3. Methods

3.1. The Method Pipeline

In this study, we developed a pipeline of our 3D neuron segmentation method, as detailed in Figure 2. This method comprises four crucial steps. Specifically, A is partition, where we cut the neuron image into a number of small cubes. The long nerve fibers of neurons may be widely dispersed over larger brain regions, which results in high computational cost when performing 3D neuron segmentation in large-scale images. To reduce this complexity, we adopted a strategy to simplify the segmentation process by cutting the neuron image into a number of small cubes. The dimensions of these cubes were set to be 32 for depth (z-axis), 128 for height (y-axis), and 128 for width (x-axis). B is segmentation, where we use a trained segmentation method to segment the neuronal cubes. C is assembling, where we integrate the segmented neural fibers based on their spatial positions within the neuron image, ultimately achieving the task of 3D neuron segmentation. D is reconstruction, where we utilize the segmented neuron images as a foundation and apply five available tracing algorithms for neuron reconstruction.

Details are in the caption following the image
The pipeline of our 3D neuron segmentation method: (A) partition, (B) segmentation, (C) assembly, and (D) reconstruction.

3.2. Overall Structure

The overall architecture of the proposed segmentation method is shown in Figure 3. This network employs a symmetrical encoder–decoder framework. In the encoder stage, the first three layers consist of a frequency-spatial representation aggregation block followed by a standard convolution with a kernel size of 2 and a stride of 2. After each encoder layer, the feature dimension doubles, while the resolution halves. The skip connections utilized a MAFM to perform cross-scale spatial-channel feature learning, transferring rich contextual information to the decoder. The decoder comprises three layers, each consisting of a FSM and a residual block [32]. Subsequently, a deconvolution with a kernel size of 2 and a stride of 2 is applied for upsampling. Contrary to the encoder, the channel dimension of the feature maps halves after each decoder layer, while the resolution of the feature maps doubles.

Details are in the caption following the image
The overall framework of the proposed network.

3.3. Frequency Space Representation Aggregation Block

For the model, mining more frequency domain information can enhance the discrimination between categories, making the boundaries between each category clearer, thereby improving the effectiveness of semantic segmentation. The FFT plays a crucial role in frequency-based image information analysis. Even for low-quality X-rays, fine details and contours can be captured in the frequency domain [33]. Convolutional layers possess a strong bias toward texture induction, enabling them to learn texture-based features. Representing objects using frequency characteristics can reduce the impact of texture bias. Specifically, high-frequency components primarily carry texture information such as boundary details, while low-frequency components are more related to the overall shape and structure of the object. Therefore, utilizing frequency characteristics for object representation can combine shape and texture information contained in different frequency components. This facilitates better separation of objects of interest and aiding in more precise analysis and understanding of image texture and shape information. Inspired by this, some signals or noises in neuron images that cannot be effectively extracted or filtered in the spatial domain can be processed in the frequency domain. Unlike CNNs and ViTs, the proposed FSRA filters images from a frequency domain perspective, with filtering operations being global and linear. As shown in Figure 4, the FSRA block first feeds the feature maps into two separate branches simultaneously. The frequency domain branch transforms spatial domain features into frequency domain features through the FFT and employs filters with learnable weight parameters to determine the weight of each frequency component, thereby accurately capturing the lines and edges of the image. The filters can automatically update their parameters through back-propagation during network training, enabling adaptive feature learning from a global perspective. After these steps, the feature maps are converted back to the spatial domain through inverse FFT to obtain the processed signals. Then, the local features captured by the convolutional layer are effectively combined with the global features extracted by the FFT layer to achieve a more comprehensive feature representation. Finally, an element-wise addition operation is performed between the concatenated feature maps and the original feature maps, further breaking the symmetry of the network, which contributes to enhance its expressive ability. The entire process can be described as follows:
()
where denotes the combination of the batch normalization (BN) and the activation function ReLu. ⊗ denotes the convolution operation, ⊕denotes the element summation, and K denotes the convolution kernel, where K1R3×3×3 and K2R1×1×1. The FSRA, by integrating features from both the frequency and spatial domains, augments the network’s feature extraction capability, rendering it more robust to neuronal structures of varying scales. This approach mitigates the influence of noise and blurred edges inherent in 3D neuronal images, thereby facilitating the model’s ability to exactly extract neuronal architectures from diverse and complex scenarios.
Details are in the caption following the image
Schematic diagram of frequency space representation aggregation block.

3.4. MAFM

To address the issues of inadequate contextual feature processing and receptive field limitations, we proposed a MAFM to optimize the original skip connections for effective multiscale feature representation. By combining convolutional layers with different kernel sizes, various receptive fields can be obtained for multiscale feature representation. However, a significant drawback of this method is that large convolutional kernels can significantly increase computational costs, thereby reducing computational efficiency. Inspired by the ASPP block [34], convolutional layers with different dilation rates can efficiently capture different scales of receptive fields, providing an efficient and economical means for multiscale feature representation. Due to the limitations of 3D imaging technology, axial undersampling may occur during the imaging process, resulting in a lower resolution in the axial (z-direction) than in the transverse (x and y directions) for 3D neuronal microscope images. The anisotropic spatial resolution of 3D images affects the network’s extraction and representation of image features. To mitigate this issue, anisotropic convolutional kernels are employed to extract information from multiple different receptive fields. Consequently, we use three dilated convolutional kernels of 3 × 3 × 1 with dilation rates of 1, 6, and 12 to expand the receptive field and learn different spatial information. The structure of the MAFM is shown in Figure 5. Low-level features are rich in spatial details, while high-level features possess richer semantic information. To obtain more contextual information, the low-level feature maps and high-level feature maps are first concatenated and fed into a channel attention module to capture the importance of each channel for the final task. Subsequently, feature maps extracted using different sizes of anisotropic convolutional kernels are combined to effectively fuse spatial information and enhance the network’s spatial perception capabilities. Then, 1 × 1 × 1 convolutions are utilized to reduce the channel dimensions of the concatenated feature maps, compressing the multichannel dimensions and extracting key features. Finally, the learned spatial-channel weight vectors are multiplied by the original feature maps to focus on different regions of the input feature maps, thereby optimizing the feature representation and enhancing the performance of neuronal image semantic segmentation tasks. The computational process of the MAFM is shown in the following equation.
()
()
()
()
where XcRC×H×W×D denotes the feature mapping after channel attention processing and Ms,i denotes the output after anisotropic convolutional kernel operation of different sizes. As is the output of spatial information fusion. Xsc is the output after combining spatial information of channels. Xsc is subjected to sigmoid transformation to obtain a weighted mapping N ∈ RC×H×W×D. The final feature map output is , as shown in the following equation.
()
where Ni represents the ith channel of N. The MAFM is able to obtain rich contextual features by effectively integrating feature information from different levels, preserving abundant details and semantic information at each stage. These comprehensive features play an important guiding role in the subsequent decoding process.
Details are in the caption following the image
Schematic diagram of the multiscale attention fusion module.

3.5. FSM

In popular encoder–decoder architectures such as U-Net [12] and V-Net [13], skip connections are commonly used to combine feature maps from encoder layers with those from symmetric positions in the decoder layers. However, simply fusing features through channel concatenation may lead to feature redundancy, as not all features effectively capture the critical information in neuronal images. Therefore, we introduced a novel FSM. The detailed structure of FSM is shown in Figure 6. Each FSM has two inputs: the feature map Fi from the previous decoder layer and the feature map Fi+1 from the current corresponding skip connection output. Initially, Fi and Fi+1 undergo element-wise summation for initial feature fusion, and then the feature flow is split into two independent branches. The first branch employs a pointwise convolutional layer to reduce feature dimensionality and perform fine-grained feature extraction. In contrast, the second branch adopts a different strategy, applying global average pooling to the input to capture global channel context. The calculation of local channel context L(Fi+1) is given by equation (7). By combining the given global channel context G(Fi) and local channel context L(Fi+1), and then passing it through a sigmoid activation function, we obtain multiscale channel attention weights M(Fi) as defined in equation (8). Finally, a weighted average of Fi and Fi+1 is taken to obtain the final fused output feature, as defined in equation (9).
()
()
()
where i ∈ {1, 2, 3}, B denotes the batchnorm layer, and δ denotes the ReLU activation function. FSM focuses on the scale issue of channels through point-wise convolution, rather than using convolutional kernels of different sizes. Point-wise convolution enables FSM to be as lightweight as possible. The dashed line in Figure 6 represents 1 − M (FiFi+1), which implements a soft selection mechanism by subtracting this set of fused weights from 1. This mechanism can flexibly adjust weights adaptively based on the importance of features, thereby retaining more valuable information during feature fusion. This approach not only avoids feature redundancy but also enhances the robustness and generalization ability of the model.
Details are in the caption following the image
Schematic diagram of the feature selection module.

3.6. Loss Function

The binary cross-entropy (BCE) loss function is a widely used loss function in classification problems. It quantifies the error by independently evaluating the category predictions for each voxel vector and then calculating the average over all voxels. However, in the task of 3D neuronal microscopic image segmentation, the number of voxels of each category within the image often exhibits significant imbalance, with foreground voxels accounting for only a small portion of the image stack. This can lead to the model being biased toward predicting background voxels during training. To mitigate the issues caused by this class imbalance, we introduced the BCE-Dice hybrid loss function. The BCE loss helps the model learn discriminative features, thereby enhancing its classification capabilities. The Dice loss focuses on measuring the similarity between the segmentation results generated by the model and the true segmentation labels, with particular emphasis on maintaining spatial continuity. Through the organic combination of these two losses, the model can learn richer feature representations, thereby demonstrating better overall performance in semantic segmentation tasks. The definitions of the BCE loss and the Dice loss are shown in the following equations.
()
()
where yi denotes the true label of the ith voxel in the sample and denotes the model predicted probability of the ith voxel. In equation (12), a very small constant ε is added to prevent the denominator from being zero. The overall loss function of the model is shown in the following equation.
()

The weights of the two loss functions can be controlled by adjusting the value of α in order to achieve the best prediction results.

4. Experiment and Result

4.1. Datasets

  • 1.

    BigNeuron Dataset: To assess the performance of the presented model, we utilize the gold-166 dataset obtained from the BigNeuron project [20], which encompasses a variety of species such as fish, humans, and mice. The dimensions of the image stacks exhibit variability, such as 2047 × 890 × 23, 724 × 1024 × 33, and 1018 × 543 × 29. To validate the proficiency of our proposed model, we selected 68 neuron images of different types, randomly using 3/4 of them for training and 1/4 for testing.

  • 2.

    Complex Whole Mouse Brain Sub-Image (CWMBS): In contrast to the BigNeuron dataset, the CWMBS [25] dataset poses a greater challenge due to the notable variations in voxel characteristics across each image. Comprising 83 images with complex background noise and 162 images featuring intricate filamentous structures, all adhering to the dimensions of 256 × 256 × 256 voxels and a spatial resolution of 0.2 × 0.2 × 1.0 μm/voxel. These images are derived from a complete mouse brain and are professionally labeled. We selected 20 images with cluttered background noise and 20 images with fine fibrous structures as the test set, while the rest are used as the training set.

  • 3.

    Data Labeling: When training deep learning segmentation networks, a large amount of accurate data labels can provide correct guidance to the network. Since the brain neuron dataset used provides expert manually reconstructed results corresponding to each neuron image, where each set of data in the SWC file contains five attributes of the neuron, namely, index, type, location (x, y, and z coordinates), radius, and parent node index [35]. Therefore, one can use Euclidean distance transform (EDT) and these neuron information to automatically generate segmentation labels for each neuron image. The value of each voxel in EDT represents the shortest geometric distance to the target point set, which is defined as shown in the following equation.

    ()

  • where (x1, y1, z1) denotes the 3D coordinates of a voxel in the neuron image, C denotes the set of target points formed by the points on the neuron centerline, and (x2, y2, z2) denotes the 3D coordinates of the voxels in that set. The Euclidean distance is defined as shown in the following equation.

    ()

Mark points in the EDT with distance values less than the radius of the neuron structure as 1 and mark the rest as 0, as shown in equation (15). Here, p and r, respectively, represent the 3D coordinates of the voxel and the radius of the neuron structure at that point, and T(P) represents the shortest distance from each voxel p to the centerline of the ground truth reconstruction. Therefore, each voxel in the neuron image is classified as background (0) or neurite (1).
()

4.2. Implementation Details

In the experiment, due to significant differences in image size, resolution, signal-to-noise ratio, and neuron morphology between the two datasets, they were trained and tested separately. Furthermore, owing to limitations in computational resources and variations in image dimensions, during the training phase, neuron images were cropped into nonoverlapping 128 × 128 × 32 blocks. Data augmentation was achieved through methods such as random flipping in the XY plane, random translation, and noise addition. In the testing phase, the entire test image of arbitrary size was used as input. The model was implemented on an Intel Core CPU and the Pytorch framework with a 24-GB Nvidia RTX 3090 GPU. During training, the SGD optimizer was used for network optimization, with an initial learning rate of 0.1, a weight decay of 0.0005, and a training cycle of 50 epochs. To alleviate the issue of sample imbalance in 3D neuron images, we employed a weighted combination of BCE and Dice loss. To investigate the impact of different α values on the training network, we set different α values on the BigNeuron dataset for network training and testing and used APP2 to reconstruct neurons from segmented images. Then, we evaluated the network’s performance by calculating segmentation metrics such as Precision and IoU, as well as neuron reconstruction metrics including ESA, DSA, and ADS. As shown in Figure 7, when α = 0.4, the network achieved the best overall performance and the most accurate prediction results. Therefore, in subsequent experiments, we set α value to 0.4.

Details are in the caption following the image
Effect of different ɑ values on segmentation results (precision and IoU) and reconstruction results (ADS, ESA, and DSA).

4.3. Evaluation Metrics

To more comprehensively evaluate the performance of the model, we separately measured the segmentation results and neuron reconstruction results. We used the following three metrics for quantitative analysis of segmentation performance: recall, precision, and IoU. The definitions of these three metrics are shown in Equations (16)–(18). Higher values indicate better neuron segmentation performance.
()
()
()

Meanwhile, we employed three distance scores to evaluate the discrepancy between the generated reconstruction and the reference reconstruction, namely, the entire structure average (ESA), the different structure average (DSA), and the percentage of different structures (ADS), which are defined in [36]. Specifically, they are calculated in the following ways. For each node in the manual reconstruction, we computed the minimum spatial distance between that node and all nodes in the reconstruction generated by the computational method. The ESA is obtained by averaging all these reciprocal minimum spatial distances across the entire structure. Since it is difficult to visually distinguish nodes separated by less than 2 voxels, the DSA is derived by summing the node pairs with distances greater than 2 voxels between the generated and manual reconstructions of neurons and computing the average. The ADS is obtained by calculating the proportion of node pairs with reciprocal minimum spatial distances greater than 2 voxels among all node pairs. These three evaluation metrics reflect the correctness of neuronal node connections, and the specific values can be generated by the Vaa3D software plugin [21], which directly provides three distance information (ESA, DSA, and ADS) by comparing the computed neuronal reconstruction results (generated SWC file) with the standard manual reconstruction results (standard SWC file). Lower values indicate smaller discrepancies between the generated neuronal reconstruction results and the standard manual reconstruction results, suggesting better neuronal reconstruction performance.

4.4. Evaluation on Images From the BigNeuron Dataset

To comprehensively evaluate the effectiveness of our proposed segmentation method in enhancing the performance of reconstruction algorithms, we conducted extensive comparative experiments. In these comparisons, our method was not only benchmarked against a series of advanced neuronal segmentation methods, including 3D U-Net (M1) [12], 3D Wave-Net (M2) [14], ER-Net (M3) [24], and ADTL-Net (M4) [25] but also against SwinUNETR (M5) [30] and 3D UX-Net (M7) [31], which are currently performing exceptionally well in the field of 3D image processing based on transformer architecture. These comprehensive comparative experiments aimed to validate the effectiveness and superiority of the segmentation model.

Table 2 shows the segmentation performance of different segmentation methods. It is directly evident from the segmentation test results that 3D FreqSNet outperforms the compared segmentation methods on most metrics. Furthermore, after segmenting the neurons within the cubes, we assemble the cubes into a complete 3D neuronal image. Subsequently, we employed the APP2 algorithm to reconstruct the assembled neurons, thereby generating automatically reconstructed SWC files. Then, we calculated the differences between these automatically reconstructed SWC files and manually reconstructed ones using the Vaa3D software plugin [21], obtaining ESA, DSA, and PDS values to assess the accuracy of the automatic reconstruction. Table 3 shows the three average metrics for 17 test neurons reconstructed using the APP2 algorithm on the segmentation results of different segmentation methods. It can be observed that 3D FreqSNet achieves the best results on the reconstruction metrics ESA and ADS. Although 3D UX-Net obtains relatively good results in PDS, it falls far behind the proposed method in ESA and ADS metrics. This is because 3D UX-Net generates erroneous segmentations due to its self-attention mechanism’s excessive focus on noise. Figure 8 intuitively demonstrates the advantages of 3D FreqSNet. Figures 8(a1) and 8(b1) show neurons with strong background noise, and it can be seen in Figures 8(a2) and 8(b2) that 3D FreqSNet effectively removes the noise. Figures 8(b1) and 8(c1) are neurons with weak filament signals, and it can be seen in Figure 8(b2) and 8(c2) that 3D FreqSNet effectively enhances the weak neuronal signals. 3D FreqSNet significantly improves the segmentation quality of BigNeuron images and enhances neuron reconstruction results. Figure 9 visualizes the reconstruction results generated by various segmentation methods and the ground truth reconstruction. 3D FreqSNet obtains the most accurate and complete reconstruction results among all compared methods, while other methods struggle to obtain precise reconstruction effects, often missing fine neuronal fiber structures (a5 and c4 in Figure 9). The reconstruction results of three complex neuronal images are basically consistent with the ground truth, demonstrating the robustness and versatility of the proposed method.

Table 2. Quantitative comparison of segmentation results of different segmentation methods on the BigNeuron test dataset.
Models Categories Test results
Recall↑ Precision↑ IoU↑
3D U-Net Background 0.9952 0.9988 0.9941
Target neuron 0.7611 0.4329 0.3812
  
3D Wave-Net Background 0.9959 0.9986 0.9946
Target neuron 0.7176 0.4585 0.3884
  
ADTL-Net Background 0.9849 0.9997 0.9846
Target neuron 0.9425 0.2281 0.2250
  
ER-Net Background 0.9949 0.9986 0.9935
Target neuron 0.7178 0.4001 0.3457
  
SwinUNETR Background 0.9964 0.9984 0.9949
Target neuron 0.6825 0.4731 0.3878
  
3D UX-Net Background 0.9918 0.9992 0.9911
Target neuron 0.8373 0.3270 0.3075
  
Ours Background 0.9968 0.9985 0.9954
Target neuron 0.6981 0.5118 0.4190
  • Note: The bold values represent the best results for the target neuron and background, respectively.
  • ↑ indicates that the larger the results of Recall, Precision, and IoU, the better.
Table 3. Quantitative comparison of the reconstruction performance of different segmentation methods on the BigNeuron dataset.
Models ESA↓ DSA↓ ADS↓
3D U-Net 4.9546 10.1750 0.2584
3D Wave-Net 9.0442 15.6249 0.3361
ADTL-Net 8.3560 14.5661 0.3393
ER-Net 6.6785 11.8864 0.2735
3D UX-Net 4.3721 5.9841 0.6759
SwinUNETR 15.9690 21.3220 0.3538
Ours 2.4356 6.9410 0.2120
  • Note: The bold values indicate that this method performs the best among all methods in each column.
  • ↓ indicates that the smaller the results of ESA, DSA, and ADS, the better the performance of this method.
Details are in the caption following the image
Qualitative comparison of visualization results for neuron reconstruction in three test sets from the BigNeuron dataset. a1, b1, and c1 are the original neuron images. a2, b2, and c2 are the segmentation results obtained using our method. a3, b3, and c3 are masks produced by ground truth. a4, b4, and c4 are the reconstruction results of the original images. And a5, b5, and c5 are the reconstruction results after segmentation.
Details are in the caption following the image
Visualized qualitative comparison of neuronal APP2 reconstruction results under different segmentation methods for the three test set images in the BigNeuron dataset.

4.5. Evaluation on Images From the CWMBS Dataset

To demonstrate the advantages of the proposed method in processing more challenging images of different categories, we conducted additional experiments using the CWMBS dataset. In this experiment, two groups of images with significant feature differences, namely, strong noise images and weak signal images, were selected to test and validate our method. As shown in Table 4, compared with six other state-of-the-art segmentation methods, the proposed method achieved the best performance in terms of recall, precision, and IoU. The qualitative visualization results of neuron segmentation are shown in Figure 10. By comparing (a1) with (a2) and (b1) with (b2), it is clear that 3D FreqSNet effectively removes noise and enhances the faint filament structures. We also visualized the reconstruction results using the APP2 method on both the original and segmented images, and it is evident that our method makes reconstruction easier and more accurate.

Table 4. Quantitative comparison of segmentation results of different segmentation methods on the CWMBS test dataset.
Models Categories Test results
Recall↑ Precision↑ IoU↑
3D U-Net Background 0.9990 0.9992 0.9982
Target neuron 0.2342 0.1904 0.1173
  
3D Wave-Net Background 0.9996 0.9991 0.9989
Target neuron 0.1185 0.3355 0.0957
  
ADTL-Net Background 0.9992 0.9992 0.9984
Target neuron 0.2319 0.2321 0.1312
  
ER-Net Background 0.9981 0.9997 0.9979
Target neuron 0.7879 0.2945 0.2728
  
SwinUNETR Background 0.9994 0.9995 0.9989
Target neuron 0.4936 0.4730 0.3185
  
3D UX-Net Background 0.9993 0.9995 0.9989
Target neuron 0.5548 0.4506 0.3309
  
Ours Background 0.9993 0.9996 0.9989
Target neuron 0.5985 0.4747 0.3601
  • Note: Each column in bold represents the best results for the target neuron and background, respectively.
  • ↑ indicates that the larger the results of Recall, Precision, and IoU, the better.
Details are in the caption following the image
Visualized qualitative comparison of neuron reconstruction results for the CWMBS dataset. a1, b1, and c1 are the original neuron images. a2, b2, and c2 are the segmentation results of our method. a3, b3, and c3 are masks produced by ground truth. a4, b4, and c4 are the reconstruction results of the original images, and a5, b5, and c5 are the reconstruction results of the segmented images.

The comparison of reconstruction results using different segmentation methods is shown in Figure 11. Previous state-of-the-art methods such as 3D U-Net, ER-Net, and ADTL-Net produce discontinuous segmentations, leading to incomplete reconstructed structures (b2, b3, and b4 in Figure 11). SwinUNETR oversegments noisy images, resulting in unnecessary scatter (b5 in Figure 11). In contrast, 3D FreqSNet achieves better reconstruction results, further demonstrating that our method greatly benefits noise reduction and enhancement of neuron structure signals.

Details are in the caption following the image
Visualized qualitative comparison of APP2 reconstruction results under different segmentation methods for the three test set images in the CWMBS dataset.

4.6. Comparison of Different Reconstruction Algorithms

We further quantitatively evaluated the impact of 3D FreqSNet on different neuron reconstruction algorithms. Comparisons of other segmentation methods were conducted on five available neuron reconstruction algorithms, namely, APP1 [16], APP2 [2], NeuroGPS-Tree (GPS) [37], MST_Tracing (MST) [38], and Snake [39]. The box plot of the quantitative analysis is shown in Figures 12(a), 12(b), and 12(c). Compared with reconstructing neuron structures directly from images without any preprocessing, the segmentation maps generated by the proposed model showed significant improvements in various reconstruction metrics. It can also be observed that after applying the five reconstruction algorithms, 3D FreqSNet outperformed other segmentation methods used for comparison in most neuron reconstruction measurement metrics. Due to the loss of a large number of nerve fibers in the segmentation results of 3D Wave-Net (M2) and ADTL-Net (M4), the reconstruction results were worse than those obtained on the original images. Similarly, as shown in Figures 12(d), 12(e), and 12(f), 3D FreqSNet significantly improved the performance of the five reconstruction algorithms on CWMBS test images, outperforming the reconstruction results on the original images in various quantitative metrics. Compared with the M1–M6 segmentation methods, the reconstruction metrics performed better. Overall, our proposed method can extract more complete neuron structures when processing complex images, which greatly simplifies the subsequent neuron tracing process and further improves the reconstruction performance of the BigNeuron and CWMBS datasets.

Details are in the caption following the image
(a–c) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the BigNeuron dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake). (d–f) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the CWMBS dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake).
Details are in the caption following the image
(a–c) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the BigNeuron dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake). (d–f) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the CWMBS dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake).
Details are in the caption following the image
(a–c) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the BigNeuron dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake). (d–f) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the CWMBS dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake).
Details are in the caption following the image
(a–c) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the BigNeuron dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake). (d–f) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the CWMBS dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake).
Details are in the caption following the image
(a–c) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the BigNeuron dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake). (d–f) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the CWMBS dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake).
Details are in the caption following the image
(a–c) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the BigNeuron dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake). (d–f) Quantitatively compares the reconstruction performance of different segmentation methods (M1–M6) and the nonsegmented original data on the CWMBS dataset using various reconstruction algorithms (APP1, APP2, GPS, MST, and Snake).

4.7. Computational Complexity Analysis

In a clinical setting, computational complexity is an important metric. The number of model parameters and floating point operations per second (FLOPs) are used to evaluate the computational complexity of different methods. The number of parameters in our model is 6.24 M, and the floating-point operations (FLOPs) required are 143.25 G. As shown in Figure 13, SwinUNETR (M5) and 3D UX-Net (M7) have a large number of parameters and high computational costs due to the quadratic complexity of their self-attention mechanisms, which limit their application in large-scale neuronal reconstruction. Methods such as 3D U-Net (M1), 3D Wave-Net (M2), ER-Net (M3), and ADTL-Net (M4) have relatively smaller model parameter counts and floating-point operation counts, but their neuronal reconstruction performance is not high. As can be seen from Figure 13, Tables 3 and 5, our method achieves a better balance between accuracy and computational cost, making it more suitable for clinical experiments and the reconstruction of very large-scale neurons.

Details are in the caption following the image
Comparing the number of parameters and floating point operations of different models.
Table 5. Quantitative comparison of the reconstruction performance of different segmentation methods on the CWMBS dataset.
Category Method ESA↓ DSA↓ ADS↓
Strong noise 3D U-Net 15.0077 19.8790 0.4669
3D Wave-Net 24.3382 29.0396 0.4943
ER-NET 22.3417 27.1394 0.4831
ADTL-Net 12.6981 17.2158 0.4778
3D UX-Net 5.3132 13.0049 0.2672
SwinUNETR 6.8669 13.0615 0.2987
Ours 4.7137 12.7418 0.2413
  
Weak signal 3D U-Net 24.7178 29.0057 0.4561
3D Wave-Net 36.7628 40.7563 0.5590
ER-NET 32.2134 41.8694 0.4647
ADTL-Net 23.3912 30.7686 0.5375
3D UX-Net 9.1634 16.6115 0.2569
SwinUNETR 13.2005 19.1474 0.3464
Ours 6.3835 13.0442 0.2126
  • Note: The bold values indicate that this method performs the best among all methods in each column.
  • ↓ indicates that the smaller the results of ESA, DSA, and ADS, the better the performance of this method.

4.8. Ablation Experiments

4.8.1. Ablation Experiments With 3D FreqSNet

In this method, we proposed the FSRA, MFAM, and FSM. To demonstrate the effectiveness of these modules, comprehensive ablation experiments were conducted on both the BigNeuron dataset and the CWMBS dataset. As shown in Table 6, the APP2 reconstruction method was used to measure 17 images in BigNeuron and 40 images in CWMBS, respectively. In addition, we also calculated the average inference time for the BigNeuron test images associated with each variant. Firstly, to verify the effectiveness of the backbone network, we replaced the original blocks in 3D U-Net with ResBlock and denoted it as ResU-Net. By comparing Tables 3 and 6, the results indicated that ResU-Net improved the ESA, DSA, and ADS reconstruction results by 1.1002, 1.3456, and 0.0363, respectively, on the BigNeuron dataset. Furthermore, by comparing Tables 5 and 6, the ESA, DSA, and ADS of 3D U-Net on the CWMBS dataset are 19.8627, 24.4423, and 0.4615, respectively, while those of ResU-Net are 10.1511, 15.9947, and 0.3368, respectively, which further demonstrates the effectiveness of the backbone network. Then, we conducted ablation experiments on the proposed modules based on ResU-Net (Net1).

Table 6. Ablation experiment results for different modules.
Design Backbone Modules Parm (M) Inference time (S) BigNeuron CWMBS
ResU-Net FSAB MFAM FSM ESA↓ DSA↓ ADS↓ ESA↓ DSA↓ ADS↓
Net1 7.34 7.44 3.8554 8.8294 0.2220 10.1511 15.9947 0.3368
Net2 6.42 7.11 3.2351 7.8325 0.2132 8.0932 15.7590 0.2563
Net3 5.28 11.67 2.8864 7.5045 0.2199 7.9587 15.4347 0.2560
Net4 6.82 18.23 2.7337 7.0147 0.2103 6.4211 14.5223 0.2358
Net5 6.24 15.23 2.5048 6.9185 0.2095 5.5486 12.8930 0.2269
  • Note: The bold values indicate that this method performs the best among all methods in each column.
  • ↓ indicates that the smaller the results of ESA, DSA, and ADS, the better the performance of this method.

Net2 replaces the ResBlock in Net1 with the FSRA. On the BigNeuron and CWMBS datasets, the neuron reconstruction metrics ESA improved by 0.6203 and 2.0579, respectively, DSA improved by 0.9969 and 0.2357, respectively, and ADS improved by 0.0088 and 0.0805, respectively. Compared to only retaining the spatial domain (Net1), introducing the FFT (Net2) significantly improves performance, indicating that the model can capture richer global information in the frequency domain, which is beneficial for improving neuron structure reconstruction performance. The FSRA module enhances the network’s learning capability by simultaneously learning information from both the spatial and frequency domains, thereby reducing noise interference and better capturing texture shapes.

To demonstrate the effectiveness of the MFAM, after removing the MFAM module from our model, the ESA of the two datasets decreased by 0.3816 and 2.4101, respectively, DSA decreased by 0.586 and 2.5417, respectively, and ADS decreased by 0.0104 and 0.0291, respectively, due to the lack of the ability to utilize cross-layer contextual features.

Similarly, if the FSM is not used, the model’s ability to adaptively select important features will be greatly weakened, which is also clearly reflected in the reconstruction metrics of the two datasets. The ESA decreased by 0.2289 and 1.6293, respectively, DSA decreased by 0.0962 and 1.6293, respectively, and ADS decreased by 0.8725 and 0.0089, respectively. In addition, the model parameters and inference time also increased slightly because the lack of FSM increases redundant feature calculations and reduces inference efficiency.

The outstanding performance of 3D FreqSNet is attributed to the synergistic effect of FSRA, MFAM, and FSM, which together enable the model to effectively capture complex spatial and frequency dependencies and accurately and robustly extract valuable features necessary for neuron image segmentation.

4.8.2. Ablation Experiments on the FSRA Module

To verify the impact of the FFT and spatial domain convolutional branches on the internal performance of FSAB, we conducted ablation experiments on FSAB, as shown in Table 7. Firstly, after removing the FFT and 3 × 3 × 3 convolutional block (replaced with Conv1 × 1 × 1), the ESA, DSA, and ADS of the BigNeuron dataset were 3.3526, 8.6201, and 0.2446, respectively, while those of the CWMBS dataset were 11.6143, 21.0707, and 0.2998, respectively. Compared to retaining the FFT and Conv3 × 3 × 3, performance decreased, indicating that ordinary convolutions are difficult to fully capture global information. Subsequently, compared to the case of not using both FFT and Conv3 × 3 × 3, introducing only spatial domain convolutions did not significantly improve the ESA, DSA, and ADS values. In contrast, introducing only the FFT significantly improved performance, indicating its ability to capture rich global information in the frequency domain. Finally, when both FFT and Conv3 × 3 × 3 were used, the ESA, DSA, and ADS of the BigNeuron dataset were 2.5048, 6.9185, and 0.2095, respectively, while those of the CWMBS dataset were 5.5486, 12.8930, and 0.2269, respectively. This demonstrates that combining FFT and Conv3 × 3 × 3 can effectively utilize the complementary advantages of the frequency and spatial domains, achieving excellent segmentation and reconstruction performance.

Table 7. Results of ablation experiments of the FSRA module.
FFT Spatial branch BigNeuron CWMBS
ESA DSA ADS ESA DSA ADS
3.3526 8.6201 0.2446 11.6143 21.0707 0.2998
3.9955 8.5949 0.2420 11.8177 17.5911 0.3517
2.8876 7.1483 0.2195 6.8812 13.4469 0.2723
2.5048 6.9185 0.2095 5.5486 12.8930 0.2269
  • Note: The bold values indicate that this method performs the best among all methods in each column.

4.8.3. Visualization of Ablation Experiments

Figure 14 intuitively visualizes the reconstruction of an example neuron image. Figure 14(a) shows the original image, and Figure 14(b) shows the manually reconstructed neuron, which accurately depicts the morphological structure of the neuron. Figure 14(c) displays the neuron reconstruction result of APP2 on the original image, while Figure 14(d) shows the APP2 reconstruction result based on the segmentation map of ResU-Net. The following three images are the APP2 reconstruction results of segmented neuron images under different ablation modules. Figure 14(h) presents the APP2 reconstruction on the neurons segmented by 3D FreqSNet. On the original image, APP2 is not only susceptible to noise interference, overtracking noise as nerve fibers, but also stops tracking weak-signal neurons. By comparing Figures 14(d), 14(e), 14(f), 14(g), and 14(h), it can be observed that after adding FSRA, MFAM, and FSM, respectively, the neuron reconstruction performance improves progressively, further demonstrating the effectiveness of these modules. As indicated by the yellow arrows, 3D FreqSNet captures richer texture details and achieves more complete reconstructions, illustrating its feasibility in suppressing noise and enhancing weak neuronal fiber structures.

Details are in the caption following the image
Visualization results under different ablation methods.

5. Conclusion

In this study, we proposed a frequency and spatial domain integrated encoder–decoder 3D segmentation network, called 3D FreqSNet, which mainly consists of three modules: the FSRA, the MAFM, and the FSM. Firstly, the FSRA extracts global and local information from frequency and spatial domain feature maps, effectively integrating frequency spatial information and reducing the network’s susceptibility to noise. Secondly, the MFAM enhances multiscale contextual perception capabilities by integrating and fusing spatial and channel positional information. Finally, the FSM avoids feature redundancy by adaptively selecting features from the encoder and decoder at the same hierarchy level. We conducted extensive experiments on two complex 3D neuron image datasets, and the results show that the proposed model not only significantly improves the reconstruction results of five existing neuron tracing algorithms but also demonstrates clear advantages compared to other advanced neuron segmentation methods in improving neuron reconstruction performance. At the same time, it exhibits higher accuracy in segmenting neuron images with strong background noise and weak voxel signals. In addition, 3D FreqSNet achieves a good balance between accuracy and computational cost, which is more conducive to large-scale neuron reconstruction.

However, this study still has some limitations. Firstly, the threshold α of the proposed loss function needs to be manually adjusted according to the dataset, which requires subjective experience. To address this, we will develop an automatic method to achieve adaptive setting of α in the future. Secondly, the proposed segmentation algorithm is based on fully supervised learning, requiring corresponding manually reconstructed annotation data. However, large-scale manual annotation of neurons demands extensive neurobiology knowledge and human resources, making it difficult to obtain. To reduce the need for a large amount of annotated data, we will explore the use of semisupervised, transfer learning, or active learning methods in the future to achieve more positive results. Finally, this study only explores the reconstruction of neurons on two animal brain slices. In the future, we will further explore the reconstruction of ultra-large–scale neuronal clusters on whole-brain images of animals, which will contribute to the analysis of brain neural networks and brain functional mechanisms, thereby promoting further research on brain cognition, brain diseases, and other aspects.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Haixing Song: writing–review and editing, writing–original draft, software, methodology, formal analysis, and conceptualization. Xuqing Zeng: writing–review and editing, visualization, and validation. Guanglian Li: supervision, resources, and formal analysis. Rongqing Wu: methodology and conceptualization. Simin Liu: resources and data curation. Fuyun He: supervision, resources, project administration, and funding acquisition.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62062014) and the grant from Guangxi Key Laboratory of Brain-inspired Computing and Intelligent Chips (No. BCIC-23-Z1).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.