Introduction
Lung illness is widespread across the entire world (Bharati et al., 2020). Lung illness is currently the most common type of cancer-related death among both genders globally. It is believed that smoking cigarettes is the primary source of lung disease. It can develop in any part of the lung, but lung illness is a particularly prevalent kind. 90%-95% are assumed to be derived from epithelial cells, which line the larger and smaller airways (Alberg & Samet, 2003). In addition, lung disease often goes undetected, resulting in a rapid decline in life expectancy or costly and dangerous therapies (Yu et al., 2016). There is a significant risk of lung disease, especially in developing and low-income countries where billions of individuals live in poverty and are subjected to air pollutants (Mondal et al., 2020). Furthermore, the number of people dying from cancer is expected to rise further, reaching roughly 17 million by 2030 (Dhaware & Pise, 2016). The only way to cure lung cancer is to detect it at the earliest stages (Silva et al., 2016).
Lung illness can be identified by a range of technological methods, including MRI, isotope, X-ray, and CT. X-ray chest radiography and Computer Tomography (CT) are two renowned structural images that are frequently employed in the diagnosis of various lung disorders (Park et al., 2011). Physicians and doctors utilize CT images to diagnose and confirm the existence of ailments, depict the morphologic ranges of ailments directly, characterize the distribution and intensity of illnesses, track the progression of diseases clinically, and assess how well patients respond to treatment (Nie et al., 2015). Typically, tests for skin, blood, sputum samples, chest X-rays, and computed tomography (CT) Scanning were techniques employed to identify lung disease (Setio et al., 2017). The rate of smoking is far greater in nations that are developing than in the US, and throughout the next several years, there will likely be a significant increase in the global lung cancer of the lungs load (Jemal et al., 2010).
A key challenge in medical imaging is automated phenomenon detection and segmentation from lung X-ray images, with the goal of increasing the accuracy and efficiency of diagnosing different lung conditions (Munawar et al., 2020). Segmentation is the process of breaking up the X-ray pictures into regions that make sense, like dividing the lungs from other anatomical structures, so that illnesses like lung cancer, pneumonia, or tuberculosis can be accurately identified (Souza et al., 2019). By automating this procedure, sophisticated algorithms can reliably and precisely identify pathological phenomena in the lungs, decreasing the need for radiologists to manually interpret images and lowering the possibility of human error. Better diagnosis, treatment planning, and patient outcomes have been made possible by the domain’s important progress in the detection and boundary delineation of anomalies with high precision, thanks to the integration of machine learning and deep learning techniques (Teixeira et al., 2021).
To find out if screens with low-dose CT could lower cancer-related death, the National Lung Screening Trial (NLST) was carried out (Team, 2011). We enlisted 33 U.S. healthcare centers with 53,454 high-risk individuals for cancer of the lungs. Individuals were randomized to receive three yearly tests with single-view poster anterior chest radiography (26,732) or low-dose CT (26,722 participants). Over 90% of the population adhered to the testing protocol. Throughout the three rounds, the percentage of positive screening tests was 6.9% for radiographs and 24.2% for low-dose CT. False positive outcomes represented 96.4% of the low-dose CT group’s positive results from screening and 94.5% of the imaging group’s good detection results.
This study describes (Ippolito et al., 2013) the comparison to typical contrast-enhanced PET/CT lung tumor setting up, the use of CT in a paired single-step procedure outlined here is possible and incorporates physiopathological data regarding the biological processes of lungs cancer related to tumor circulation and digestion, without the need for the use an extra contrast agents as well as an acceptable greater radiation exposure for the individual.
This study (Ali et al., 2021) offers an unbiased summary of the gold standard of the database, commonly used imaging modalities, imaging methodologies, and relevant research for each cancer in 2016–2021. The primary objective is to methodically investigate the mechanisms for diagnosing cancer in multiple human organs as previously indicated. More than 70% of investigators in deep learning report encouraging outcomes when using CNN-based techniques for the early detection of multi-organ cancer, according to our comprehensive study analysis. This study covers the lengthy discussion section as well as the opportunities, difficulties, and potential solutions for current research.
In this study (Riquelme & Akhloufi, 2020), we look at recently suggested the most advanced deep learning architecture and methods as CAD systems for the identification of lung illness. They fall separated into two groups: (1) False positive decrease systems, which categorize a collection of given prospective tumors into cancerous or benign cancers; and (2) Nodule identification systems, which identify candidate nodules from the original CT scan. The primary attributes of the various methodologies are showcased, and their efficacy is scrutinized. Additionally introduced are the research-useful CT lung datasets. An assessment and discussion of the various strategies are provided.
A branch of machine learning called “deep learning” studies methods that draw motivation based on the composition and functions of the human brain. Advances in machine learning, specifically in deep learning, facilitate the recognition, measurement, and categorization of patterns in medical imagery (Shen et al., 2017). Deep learning methods have lately shown incredible effectiveness in a variety of artificial intelligence tasks, including the processing of medical images. A family of deep learning models known as convolutional neural networks (CNNs) have become extremely effective in taking elements out of photos and categorizing them. CNNs are particularly ideal for difficult healthcare imaging analysis tasks since they are able to recognize complicated patterns and traits from raw pixel data. The absence of labelled medical datasets is among the principal obstacles to creating effective deep learning models for image processing in medicine. Large-scale labelled datasets are frequently difficult to come by due to the time-consuming and specialised knowledge required to annotate medical images (Wu et al., 2018). Because there is a deficiency of medical data, training deep learning algorithms from the beginning may result in overfitting and poor performance. Transfer learning presents a viable remedy to this constraint. Transfer learning is using knowledge gleaned from previously trained models on sizable and varied datasets, like ImageNet, and adapting this information to a particular target task. Deep learning models can refine their pre-trained layers on the target medical imaging dataset by using transfer learning to learn general features from those layers. When the target dataset is small, this method can considerably enhance the reliability of deep learning models (Ma et al., 2020).
The significance of lung disease detection using deep learning techniques lies in its potential to revolutionize early diagnosis and treatment, thereby improving patient outcomes and reducing healthcare burdens. Deep learning, a subset of artificial intelligence, allows for the development of sophisticated algorithms that can analyze medical imaging data with remarkable accuracy and speed. In the context of lung diseases, such as pneumonia, tuberculosis, and lung cancer, early detection is crucial for effective intervention. Deep learning models can efficiently identify subtle patterns and abnormalities in pictures for medicinal purposes like chest X-rays and CT scans, enabling healthcare professionals to detect diseases at their nascent stages. This early detection not only enhances the chances of successful treatment but also minimizes the economic and emotional costs associated with advanced-stage diseases.
Background History
In modern times, diagnostic imaging is essential since it allows non-invasive visualization of internal body structures. Chest X-rays and computed tomography (CT) scans are among the most commonly used imaging modalities for diagnosing various diseases, including pneumonia and lung illness. However, the mechanical analysis of these pictures is time-consuming and subject to inter-observer variability. In the context of medical image analysis, transfer learning has emerged as a powerful technique to address the scarcity of labeled medical data. Advantage of transferable knowledge pre-trained models with extensive training and diverse datasets, such as ImageNet, which contains millions of natural images (Vaishya et al., 2016). By transferring the learned knowledge from the original domain (nature picture) to the target domain (medical images), transfer learning allows deep learning models to generalize better and achieve higher performance, even when the target dataset is limited. In the specific case of pneumonia detection using chest X-rays, the research article “Transfer Learning with Deep Convolutional Neural Network (CNN) for Pneumonia Detection Using Chest X-ray” likely explores the application of CNNs and transfer learning techniques (Rahman et al., 2020). The study might focus on using pre-trained CNN models to detect pneumonia in chest X-ray images and assess the model’s performance against other existing methods (Sourav & Wang, 2023). The paper may also investigate the interpretability of the CNN model’s decisions, aiming to provide insights into the diagnostic reasoning behind the model’s predictions.
Research problem
The challenge of precisely identifying and segmenting pathological regions within the lungs, such as tumors, infections, or other abnormalities, from complex and frequently low-contrast medical images lies at the heart of the research problem in automated phenomena detection and segmentation from lung X-ray images. Conventional manual analysis by radiologists takes a lot of time and is prone to error, which could cause delays or inaccurate diagnoses. To overcome the limitations of human interpretation, this calls for the development of reliable, automated systems that can consistently and precisely detect and segment these phenomena. Current algorithms struggle to achieve high accuracy and reliability due to the subtle nature of some pathological changes and the complexity of lung structures. In order to ensure that the detection and segmentation processes are both highly accurate and efficient, the research problem is focused on developing sophisticated machine learning and deep learning models that are capable of comprehending and processing the intricate patterns in lung X-rays. This will ultimately improve patient care and clinical decision-making.
Research Questions
Try to determine the answers to the questions that follow:
How can advanced deep learning models be developed to accurately detect and segment pathological regions within lung X-ray images, and what techniques can enhance the precision and reliability of this segmentation process?
What is the most effective way to implement a comprehensive framework that combines state-of-the-art image processing techniques with deep learning models to achieve efficient and scalable processing of lung X-ray datasets?
How do the proposed deep learning models and framework perform when evaluated on diverse and large-scale lung X-ray datasets, and how do they compare in terms of accuracy, sensitivity, specificity, and computational efficiency?
Research objectives
The following are the main goals of this thesis:
To develop advanced deep learning models capable of accurately detecting and segmenting pathological regions within lung X-ray images, focusing on enhancing the precision and reliability of the segmentation process.
To implement a comprehensive framework that integrates state-of-the-art image processing techniques with the developed deep learning models, ensuring efficient and scalable processing of lung X-ray datasets.
To evaluate the performance of the proposed models and framework through extensive experimentation on diverse and large-scale lung X-ray datasets, assessing metrics such as accuracy, sensitivity, specificity, and computational efficiency, to ensure the robustness and clinical applicability of the system.
Research motivation
The motivation for investigating lung disease detection using deep learning techniques lies in the urgent need to revolutionize and enhance diagnostic capabilities for respiratory conditions. Conventional methods often face challenges related to subjectivity and time constraints, impacting the accuracy and speed of diagnoses. The interpretation of lung X-rays by radiologists is an essential part of traditional diagnostic techniques. However, this process is labor-intensive, prone to human error, and frequently constrained by the availability of qualified professionals, especially in environments with limited resources. Automated systems that can analyze X-ray images quickly, accurately, and reliably are desperately needed, as lung conditions like pneumonia, TB, and lung cancer are becoming more common. By using cutting-edge deep learning techniques to identify and categorize pathological regions, it may be possible to increase diagnostic accuracy and make these services available in areas with a lack of medical expertise.
Research scope
This study’s focus on automated phenomenon detection and segmentation from lung X-rays includes the creation, application, and assessment of cutting-edge deep learning models with the goal of enhancing the precision and effectiveness of lung disease diagnosis. The goal of this research is to develop a strong framework that combines machine learning algorithms with cutting-edge image processing techniques to identify and segment pathological regions in lung X-rays, including infections, tumors, and other abnormalities. To verify the model’s generalizability across various populations and disease types, the study will entail the gathering and analysis of extensive, diverse lung X-ray datasets. The study will employ metrics like accuracy, sensitivity, specificity, and computational efficiency to assess the performance of the suggested models. In order to improve the model’s dependability in actual clinical settings, the scope also entails investigating the difficulties related to the complexity and variability of lung X-ray images, addressing problems like low contrast and overlapping structures. The ultimate objective is to create a system that can be incorporated into clinical workflows and give medical professionals fast, accurate diagnostic support.
The Proposed Contributions of the Dissertation
Using deep learning techniques, the dissertation makes numerous noteworthy contributions to the field of lung disease identification. This dissertation proposes to advance the field of medical imaging by creating a novel deep learning-based framework that greatly improves the accuracy, efficiency, and reliability of lung disease diagnosis. The dissertation focuses on automated phenomena detection and segmentation from lung X-rays. The dissertation will address current shortcomings in image processing and model performance, specifically in managing the complexity of lung structures and the variability in X-ray quality, by introducing novel approaches for segmenting and detecting pathological regions in lung X-rays. The development of a strong, scalable model that performs better in actual clinical settings and can generalize well across a range of patient demographics and lung conditions will be a significant contribution. Through intensive testing on sizable and varied datasets, this study will offer a thorough assessment of the model’s efficacy and provide insightful information about its potential uses in healthcare settings. The dissertation aims to reduce diagnostic errors, improve patient outcomes, and increase accessibility to high-quality diagnostic tools, particularly in resource-constrained environments, by incorporating this advanced model into clinical workflows.
Dissertation Organization
The thesis is organized into five chapters. Chapter 1 describes the overview of the work-related lung disease detection using deep learning techniques introduction, research question, research objectives, and contribution of the work. Chapter 2 provides information on related work on lung disease detection. Chapter 3 describes the methodology, including the dataset, preprocessing, techniques and the general procedures for detecting lung disease in medical imaging data using deep learning. Chapter 4 describes the performance evaluation of the lung disease detection by conducting experiments and showing results, and also describes the limitations and discussion. Chapter 5 includes the conclusion and future work.
Literature Reviewed
Background
The background in lung disease detection using deep learning techniques underscores the pressing need for advanced diagnostic tools to combat the significant global health challenges posed by respiratory conditions. Lung diseases, including pneumonia, tuberculosis, and lung cancer, represent major global causes of illness and death (Priyadarsini et al., 2023). Traditional diagnostic methods often face limitations regarding precision and effectiveness, particularly in early disease detection. In recent years, the advent of deep learning, a subset of artificial intelligence, has shown remarkable promise in transforming the landscape of medical imaging and diagnostics.
Methods for deep learning, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), excel at learning intricate patterns and features from large datasets, making them well-suited for the analysis of medical images like chest X-rays and CT scans (Asuntha & Srinivasan, 2020). The ability of frameworks for deep learning to discern subtle abnormalities holds great promise for the early detection of lung diseases, enabling timely intervention and improved patient outcomes. This paradigm shift in diagnostic approaches aims to overcome the challenges associated with traditional methods, such as human subjectivity, variability in interpretation, and limitations in processing vast amounts of complex imaging data. Moreover, the background in this field emphasizes the expanding collection of research that highlights the success of deep learning in various medical imaging applications (Christe et al., 2019). The success stories range from detecting abnormalities in radiological images to assisting in the identification of specific disease markers. However, challenges remain, including the capacity to analyze models created by deep learning, ethical considerations, and the seamless integration of these technologies into routine clinical workflows.
The Swim Transformer, inspired by the concept of Self-Attention Mechanism, introduces spatial information into the transformer architecture, enhancing its capability to capture intricate patterns within images (Ukwuoma et al., 2022). Conversely, the Vision Transformer, originally developed for natural image classification tasks, has showcased remarkable adaptability in medical imaging by efficiently leveraging self-attention mechanisms to learn representations directly from pixel values, eliminating the need for handcrafted features. When integrated with CNN architectures, these transformer-based models offer a potent framework for lung disease detection, facilitating comprehensive analysis of chest radiographs or CT scans (Ghali & Akhloufi, 2023).
The background in lung disease detection using deep learning techniques reflects Artificial Intelligence’s Possibility for Revolution in revolutionizing the accuracy, efficiency, and early detection capabilities in respiratory medicine. The integration of deep learning into this domain represents a promising frontier that can significantly impact public health outcomes and enhance the overall effectiveness of lung disease diagnosis and treatment.
Lung cancer and its diagnosis
Lung cancer is still a serious worldwide health concern, taking responsibility for a substantial number of deaths from cancers each year. It arises from the uncontrolled growth of abnormal cells within the lung tissues, often forming malignant tumors that can metastasize to other parts of the body. As one of the main causes of cancer-related mortality, lung cancer emphasizes the importance of early detection and accurate diagnosis for effective treatment strategies and improved patient outcomes. Lung cancer is still a serious worldwide health concern, taking responsibility for a substantial number of cancer-related deaths each year. It arises from the unchecked proliferation of aberrant cells within the lung tissues, often forming malignant tumors that can metastasize to other parts of the body. As one of the leading causes of cancer-related mortality, lung cancer emphasizes the significance of early identification and accurate diagnosis for effective treatment strategies and improved patient outcomes.
Types of lung cancer:
Lung cancer is broadly divided into two primary kinds according to the cells of origin and their characteristics:
Non-small cell lung cancer (NSCLC):
This encompasses large cell carcinoma, squamous cell carcinoma, and adenocarcinoma are some of the subtypes. NSCLC is the most prevalent form of lung cancer, often detected in later stages due to fewer early symptoms.
Small cell lung cancer (SCLC):
This type is less common but more aggressive. SCLC tends to spread rapidly, making early detection crucial for effective treatment.
Risk factors and new findings:
While smoking continues to be a predominant risk factor for lung cancer, new research has highlighted other contributing factors and molecular mechanisms.
Genetic mutations:
Advances in genomic analysis have uncovered specific genetic mutations that contribute to the appearance of lung cancer. Mutations in genes like EGFR, ALK, ROS1, and BRAF have prompted the creation of specialized treatments tailored to the individual’s genetic makeup.
Environmental exposures:
Beyond tobacco use and encounters with radon and other pollutants in the atmosphere, asbestos, and air pollutants can increase the risk of lung cancer. Additionally, genetic predispositions are now being studied in conjunction with environmental factors.
Immune checkpoint inhibitors:
The emerging class of immune checkpoint inhibition of immunotherapy drugs has shown remarkable success in treating certain types of advanced lung cancer. These drugs block specific proteins that inhibit the immune response, allowing The capability of the immune system to identify and combat cancer cells.
Diagnostic techniques and innovations:
Advancements in diagnostic techniques have improved the precision and effectiveness in identifying lung cancer.
Liquid biopsies:
A sample of floating tumor DNA is examined using fluid assays. This non-invasive technique enables monitoring of tumor mutations and treatment response, potentially replacing invasive tissue biopsies.
Artificial Intelligence (AI) and machine learning:
AI algorithms in specific, models for deep learning like Convolutional Neural Networks (CNNs), are being applied to analyze medical imaging data. AI-driven systems can identify subtle patterns in CT scans and X-rays, aiding radiologists in early cancer detection.
Early detection screening programs:
It is advised that low-dose computed tomography (LDCT) screening be done for high-risk people to identify lung cancer, such as heavy smokers who are currently or were in the past at its earliest stages, when treatment is most effective.
Overdiagnosis and False Positives:
Early detection efforts can lead to overdiagnosis and unnecessary interventions due to the identification of indolent, non-progressing tumors.
Integration of AI:
While AI holds great promise in improving diagnosis, its integration into clinical practice requires validation, standardization, and addressing concerns about reliability and interpretability. In conclusion, lung cancer diagnosis has evolved significantly due to breakthroughs in genetics, diagnostic techniques, and AI-driven approaches. With a focus on targeted therapies, precision medicine, and early detection efforts, the field is moving toward more personalized and effective strategies in the battle against lung cancer.
Transfer Learning in Medical Imaging
Transfer learning, a powerful concept in machine learning, has found significant applications in the domain of medical imaging. It involves leveraging information obtained from an outside task to enhance a target task’s effectiveness. In medical imaging, where labeled datasets are often limited and the demand for accurate diagnoses is critical, transfer learning has surfaced as a key strategy to enhance the efficiency of machine learning models. The success of numerous research projects, including the particular topic of this thesis, is significantly impacted by transfer learning, a key principle in the field of machine learning. Transfer learning uses the skills and knowledge acquired from one activity or domain to improve performance on a related task. This fundamental idea serves as a framework for the thesis’s goals and is crucial in resolving issues like data scarcity, model generalization, and effective knowledge utilization.
Feature Reusability
Knowledge Transfer
Task Relationships
Reusability of Features:
Transfer learning makes use of the notion that features picked up for a source task may be reused for a target task. These common characteristics, which are frequently retrieved by deep neural networks, capture pertinent high-level representations that condense the most important data-related information. Even when the target job has little data, the model can accelerate its learning process by reusing these features.
Knowledge Transfer:
Transfer learning makes it easier to move knowledge from one domain to another. This information encompasses the relationships, structures, and patterns found in the source data that are useful for comprehending the target data. The transfer of this knowledge improves the model’s capacity for generalization, adaptation, and precise prediction on the intended task.
Relationships between Tasks:
Transfer learning depends on the choice of the source task. The effectiveness of transfer learning depends on the connection between the activities at hand and the goal tasks. The knowledge learned from the source is ensured to be pertinent and transferable to the target work by the carefully chosen source task’s similar features, structures, or concepts with the target task.
Model Training and after pre-training:
Pre-training the model:
The thesis entails pre-training a deep neural network on a source task using a larger and more varied dataset. This pre-training step enables the model to pick up general traits that are pertinent to the target domain and will be used for the task at hand.
After pre-training:
The model will be fine-tuned using a smaller, more focused dataset for the target task. The parameters of the model will be tweaked during fine-tuning to focus its knowledge on the target task while keeping the beneficial traits discovered during the pre-training phase.
Domain Adaptation:
Techniques for domain adaptation will be investigated because the objective task in the thesis may have some domain-specific characteristics. By bridging the gap between the source and target domains, these strategies hope to guarantee that the model can effectively transfer its learned information even when the data distributions are different.
Advantages and Relevance:
The following are the advantages and Relevance.
Data Efficiency:
Transfer learning is especially useful when there is less data available for the target job. The thesis work maximizes data efficiency and minimizes the danger of overfitting on the target task by utilizing the bigger source dataset.
Improved Generalization:
By exposing the model to a wider range of data during the pre-training phase, transfer learning improves the model’s capacity to generalize. This larger exposure aids the model’s learning of robust and domain-invariant traits, producing predictions for the target task that are more accurate.
Effective Knowledge Utilization:
The thesis work maximizes the use of computational resources and training efforts by utilizing knowledge from a source job. The pre-trained model records important information, enabling the model to more quickly pick up on the particulars of the target task.
Future Perspectives:
The thesis’s incorporation of transfer learning paves the way for additional lines of inquiry.
Domain Adaptation Techniques:
As the field of domain adaptation develops, investigating and putting into practice cutting-edge methods to manage domain transitions more successfully will be a valuable area for more research.
Multi-Task Learning:
Using transfer learning as a foundation, A model is concurrently trained on several related tasks as part of learning that involves multiple tasks. The goal is to improve knowledge transfer and model performance across tasks. Future studies should take into account ethical issues, such as potential bias induced by the source data and ensuring that the transferred knowledge is in line with the particular requirements of the destination domain.
Literature Reviewed
This paper (Ghimire & Subedi, 2024) presents a novel deep learning-based automatic lung volume estimation method that learns segmentation and regression of volume estimation simultaneously while multitasking. With a few shared layers, the two networks, the segmentation and regression networks, are operated sequentially. The final lung volume is calculated by regressing the output of the segmentation network, which splits the X-ray images. Additionally, the dataset gathered for the suggested method comes from three distinct secondary sources. The outcomes of the experiment demonstrate that the suggested multitasking strategy outperforms the individual networks. Additional examination of the multitasking strategy using the two distinct networks, HRNet and UNet, reveals that the HRNet network outperforms the UNet network with a lower volume estimation mean square error of 0.0010.
Currently, one of the most well-known causes of death from malignancy is cellular breakdown in the lungs. Recognition and precise identification of potentially carcinogenic lung knobs at an early stage of their development will increase treatment feasibility and reduce lung cancer mortality. The lack of distinguishable symptoms until the cellular breakdown in the lungs has already spread is a major barrier to early detection. It is planned that analysis and screening will be conducted using non-intrusive imaging techniques like computed tomography (CT). But a precise mechanized analysis of these lofty images was needed to fully grasp the potential of this technology. The division of the images is an important stage of that interaction.
This research (Lu, 2021) describes that first, a deep convolutional neural network was built taking into account the intricacy and fuzzy features of lung CT scans. Second, the relationship between detection rate and model parameters (iterations, various resolutions) is explored. Thirdly, the impact of various model configurations on lung tumor detection was examined through adjustments to the convolution kernel size, feature dimension, and network depth. Additionally, the various optimization techniques for affecting the DCNN performance were examined from three angles: algorithms for training (batch gradient descent and gradient descent with momentum), activation coefficients and bringing together techniques. Ultimately, the trial findings confirmed that DCNN may be utilized for automated lung tumor diagnosis and that, with the right model settings and framework, together with a gradient descent with velocity approach, it can obtain a high identification rate.
The least of each segment and the limit of each column are used in fuzzy-based picture division plans. This study (Akter et al., 2021) developed an approach that uses middle qualities assessed along each line and section in addition to the maxima and minima qualities, and discovered that this methodology increased the sectioning precision of these images. In the next phase of the experiment, a neuro-fuzzy classifier divided the segmented lung knobs into dangerous and benign knobs. As parameters for evaluating the execution, affectability, particularity, and precision were used. With a lower false positive rate, the proposed method produced an effectiveness, particularity, accuracy, and precision of 100%, 81%, 86%, and 90%, respectively. In spite of the fact that Radiologic monitoring lowers deaths in persons at high risk linked to cellular breakdown in the lungs, only a tiny percentage of qualifying people go through such monitoring in the United States. The accessibility of blood-based testing may increase the use of screening. In order to make the use of monitoring applications more possible, we presented an improvement to disease-specific deep sequencing for characterizing (CAPP-Seq), a method for examining circulating cancer DNA (CTDNA). We demonstrate that CTDNA is present before treatment in the majority of patients, despite levels being exceedingly low in early phase cellular breakdowns in the lungs, and its quality is powerfully predictive. We also discovered that most of the significant alterations in the non-cell DNA (CFDNA) of patients with cellular breakdown in the lungs and of risk in comparison to controls are non-intermittent and indicate clonal hematopoiesis. Contrary to growth-inferred modifications, clonal hematopoiesis changes take place on longer CFDNA segments and require mutational markers linked to tobacco use. By combining these findings with other atomic elements, we develop and provisionally certify an AI method called “cellular breakdown in the lungs probability in plasma” (Lung-Clasp), which can effectively separate patients with Lung tumors in their early stages from controls at risk. This method enables adjustment of measure explicitness to function with certain clinical applications and achieves execution similar to that of growth educated CTDNA positioning.
This research (Selvanambi et al., 2020) proposes a more complex neural network model based on motivated glowworm swarm optimization calculations for organizing multimodal malignancy information. Heterogeneous tumor samples are grouped using methods based on higher-order neural networks. The outcomes show that when compared to other methods and basic procedures, RNN-GSO approaches provide improved accuracy, specificity, and sensitivity. In the future, more precise optimizing estimates can be achieved by combining GSO or other swarm knowledge-based optimization calculations with other more complex neural networks. In this study, two different networks are tested for lung tumor detection; furthermore, more complex neural networks can be studied and assessed to determine their effectiveness. For parameter optimization, an ordered search technique ought to be investigated. The findings of this study have been mimicked using MATLAB, and an actual scenario is required to assess the network’s performance in predicting lung tumors. The method described in our research yields a 98% accuracy rate.
In this article (Filipska & Rosell, 2021), we summarize the latest research on lung cancer-related mutation identification in liquid biopsies and discuss methods for circulating tumor DNA (ctDNA) detection-based lung cancer diagnosis and surveillance. In addition to EGFR modifications, we also concentrate on noteworthy co-mutations that provide additional insight into innovative ctDNA-based liquid biopsy applications.
In this research (Goyal & Singh, 2021), we have suggested a method for classifying pneumonia and COVID-19 respiratory diseases from chest X-ray pictures. Soft computing, neural networks, and deep learning approaches form the foundation of the suggested framework. This model differs significantly from existing Covid-19 detection methods in that it considers lung-specific characteristics only after properly enhancing the picture and performing ROI-driven extraction of features and normalization. We have presented a deep learning model, F-RNN-LSTM, to improve the classification performance with the lowest computing resources. We used the C19RD and CXIP information sets, which are available to the public, to demonstrate the outcomes for F-RNN-LSTM as well as Methods of learning through machine learning and computer science, like SVM, ANN, and KNN.
The article (Kooi et al., 2017) aims to provide a classification of the most advanced machine Systems for detecting lung diseases based on knowledge, illustrate the tendencies of current studies in this field, and pinpoint any unresolved problems and possible future paths. Seven characteristics of picture types, characteristics, and data in the classification enhancement, neural network algorithm types, transfer education, the group of classification algorithms, and lung condition types are shared by all of the publications in the survey. Other researchers could use the provided taxonomy to organize their contributions to and activities related to their research. The proposed future approach may lead to even greater efficiency gains and a rise in the number of lung disease cases that are identified, applications helped by deep learning.
According to our research (Vaidya et al., 2020), individuals with NSCLC who are considering immune checkpoint blockade may be able to be identified based on radiomic examination of their pretherapy CT scans in order to determine which patients are probably to experience hyperprogression from this treatment. The capacity to examine regularly available CT scan pictures and the fact that risk evaluation is non-intrusive, which eliminates the need for extra biopsy samples, are other advantages of employing radiomic analysis.
Our results demonstrate the utility of the cfDNA for lung screening for cellular breakdown and emphasize the significance of risk assessment in monitoring trials based on cfDNA. Recent research suggests that acute and chronic leukaemias are the two basic subtypes of leukaemia. Myeloid and lymphoid cells can be used to further split each kind. There are four distinct types of leukaemia. Based on the subtypes of the disease and their characteristics, numerous methods for leukaemia detection have been suggested. On the other hand, these methods call for enhancements regarding efficacy, the process of education, and general performance. In order to improve and deliver quick and secure leukaemia detection, a platform according to the Internet of Medical Things (IoMT) is being developed. There are numerous things to consider when this is all going on. The proposed IoMT system utilizes cloud computing to connect clinical devices to network resources. Communication can take place in real time thanks to technology. Leukaemia testing, diagnosis, and treatment may be coordinated by patients and medical professionals, thereby saving time and money. The efforts of both patients and doctors are acknowledged. The paradigm that has been discussed is also useful for addressing the problems that patients with serious illnesses encounter. The approaches used to identify leukaemia subtypes in pandemics like COVID-19 are detailed within the recommended structure as follows: Convolutional neural networks (ResNet-34) come in two varieties: dense convolutional neural networks (DenseNet-121) and residual convolutional neural networks (RCNN). There are two that are open to the general public. Leukemia-related databases, such as the ALL-IDB and the ASH image collection, are used in this work. The outcomes showed that the recommended model was accurate. This method outperforms additional renowned machine learning methods for distinguishing between healthy and leukaemia subtypes. Significant progress has been made in the past ten years in the identification of circulating cancer DNA (ctDNA), without cell DNA (cfDNA) sections that are carried into the course from cancer cells and demonstrate the hereditary changes to those tissues. As a result, ctDNA found in fluid biopsies serves as a tremendous asset for defining disease patients, directing treatment, identifying resistance, and monitoring relapse. The authors of this study (Filipska & Rosell, 2021) describe cellular degeneration in the lungs’ conclusions and observation techniques using ctDNA identification advances and request further research into the recognition of lung cancer-related changes in fluid biopsies. We focus on both important co-changes and modifications to the epidermal growth factor receptor (EGFR), which sheds further insight on the original ctDNA-based fluid biopsy applications. Finally, we look at potential directions for early-malignant growth detection and clonal haematopoiesis separation methods, possibly using fluid biopsy driven by the microbiota. The objective of the previously mentioned study (Lee et al., 2013) is to use sophisticated learning techniques based on convolutional neural systems to help clinical specialists by offering a thorough examination of clinical respiratory sound data for the identification of coronavirus breathing issues. In our suggested technique, we used a VG-19, CNN, and AI library component known as MFCC. Recent developments in AI and deep learning have played a vital role in offering remedies for clinical space challenges. Additionally, they use scientifically sound research to improve their ability to predict illness in advance and with accuracy. According to a projected framework (Swapna et al., 2021), a patient’s Pinnacle Expiratory Stream will likely be recorded and stored on an outside mobile phone accessory that is simple to use, small, and sensibly valued. In any case, by giving appropriate treatment to Coronavirus breathing issues, we can handle and prevent this infection. By getting the ringer sound created by the signal, the individuals can likewise stay away from the particular area where they can’t endure. We’ve additionally fostered a custom programming interface for remote information transmission. Exceptional Wellbeing Investigating Plan gives all arrangements to customers, for example, Specialists and Patients to observe the patient details from anyplace whenever. The Haze Figuring is an imaginative area, which gives capacity to the worker to work dependent on obstacle-free handling rationale. In conventional plans, it is hard to raise an alarm depending on the crisis circumstance expectations, yet in the suggested profound learning methodology, the suggested approach (de Castro et al., 2017) intends to send a warning immediately if anything crisis cases happen on the tolerant end. Because of a shortage of prepared HR, clinical professionals are thankful for innovative help since it permits them to manage more patients. Besides basic wellbeing illnesses like cancer and obesity, the effect of respiratory sicknesses is continuously expanding and becoming hazardous for society in the Coronavirus-19 – 19 situation. In respiratory infections, early determination and treatment are basic, so the sound of respiratory sounds is becoming extremely valuable in making specialists aware of patients in a crisis circumstance who require prompt consideration in this Coronavirus circumstance. In general, many countries struggle to manage their infrastructure for disaster preparedness. IoT and AI together displayed the growth of the structure’s skill by providing constant information security using sensors and actuators for forecast systems. The use of an IoT device to monitor interior ecological boundaries for multidimensional time series on habitation is recommended by (Hitimana et al., 2021). Through the application of the LSTM Profound Learning computation, the information on the presence of individuals is inferred. Using multidimensional time series data as indicators for the relapse defining issue, a test is applied in an office. The results gained show that the framework enables the collection, connection, and preservation of environmental data. The LSTM was computed using the collected data and evaluated against other AI systems. The computations are supporting SVM Classifier, Guileless Bayes Structure, and Inter Perceptron Feed-Forward Organization. The results that followed parametric changes show that the LSTM model outperforms the indicated application. Recent technological breakthroughs like AI and extensive learning have been significant in offering fixes for issues in the clinical field. (Hitimana et al., 2021) Also, concentrate on clinical imaging and sound examination-based precision for timely and opportune infection discovery. Clinical experts are seeking such innovative support as it helps them adapt to more patients due to the lack of qualified HR. In addition to common illnesses like cancer and diabetes, respiratory infections are also gradually increasing and posing a hazard to the population. Early identification and diagnosis are crucial for respiratory infections, thus, the frequency of breathing noises is proving to be especially helpful when combined with chest X-rays. By thoroughly analyzing clinical respiratory sound data, the proposed study aims to employ deep learning-based CNN algorithms to assist medical practitioners in detecting Chronic Obstructive Pulmonary Disease. In lead testing (Shim et al., 2005), we made use of Librosa AI library components as Mel-Spectrogram, MFCC, Chroma CENS, and Chroma. The framework might also be used to describe the illness’s severity, using terms like moderate, strong, or severe. The assessment results support the suggested deep learning technique. The platform now has an ICBHI score of 93 percent for order accuracy. Additionally, we increased the representation of the offered deep learning by using K-fold Cross-Validation with 10 parts in the lead tests.
This work (Lakshmanaprabu et al., 2019) introduces a novel predictive categorization technique for lung CT (Computerized Tomography) pictures. In the present study, the CT scan of lung pictures was processed using Linear Discriminant Analysis (LDA) and Optimal Deep Neural Network (ODNN). In order to reduce the dimensionality of the deep features extracted from CT lung images, LDR is utilized to categorize lung nodules as benign or malignant. To determine the categorization of lung illness, the ODNN is applied to CT scans and after that, the Modified Gravitational Search Algorithm (MGSA) was used for optimization. As stated on the comparative information, the suggested classification has a 96.2% sensitivity, 94.2% precision, and 94.56% reliability.
Our results demonstrate the utility of the cfDNA for lung screening for cellular breakdown and emphasize the significance of risk assessment in monitoring trials based on cfDNA. This research (Masood et al., 2018) suggests that acute and chronic leukaemias are the two basic subtypes of leukaemia. Myeloid and lymphoid cells can be used to further split each kind. There are four distinct types of leukaemia. Based on the subtypes of the disease and their characteristics, numerous methods for leukaemia detection have been suggested. On the other hand, these methods call for enhancements regarding efficacy, the learning process, and general performance. In order to improve and deliver quick and secure leukaemia detection, a platform based on the Internet of Medical Things (IoMT) is being developed.
This work (Khosravan & Bagci, 2018) aimed to investigate the idea that the efficacy of computer-assisted diagnosis (CAD) algorithms on both tasks can be enhanced by jointly learning false positive (FP) of nodules and nodular identification. To order to corroborate this theory, we suggest using a 3D deep multi-task CNN to address these two issues simultaneously. Our method demonstrated an average dice similarity coefficient (DSC) of 91% for segmentation precision and a nearly 92% score for FP elimination on the LUNA16 dataset. We demonstrated increases in segmentation and FP reduction tasks across two starting points as evidence for our idea. Our findings corroborate the idea that practicing these two tasks simultaneously using a multitasking learning strategy enhances system efficiency on both.
In this research (Achu et al., 2021), we introduced two artificial neural networks that combine numerous remaining channels with different frequencies to identify lung cancers from CT images. Our findings unequivocally show a decrease in precision for segmentation over a variety of samples. Our method can be used for long-term volume of tumor tracking for cancers undergoing immunotherapy treatment, which changes the tumors’ look and extent on computed tomography scans. The approach is potentially for other places as well, considering its success with lung cancers. Our two suggested MRRNs perform better than current techniques.
Our research (Nam et al., 2019) results demonstrate the utility of the cfDNA for lung monitoring for cellular breakdown and emphasize the significance of risk matching in cfDNA-based monitoring investigations. Recent research suggests that acute and chronic leukaemias are the two basic subtypes of leukaemia. Myeloid and lymphoid cells can be used to further split each kind. There are four distinct types of leukaemia. Based on the subtypes of the disease and their characteristics, numerous methods for leukaemia detection have been suggested. On the other hand, these methods call for enhancements in terms of efficiency, the Method of understanding, and general performance. In order to improve and deliver quick and secure leukaemia detection, a platform based on the Internet of Medical Things (IoMT) is being developed.
Through the application of the LSTM Profound Learning computation, the information on the presence of individuals is inferred (Eun et al., 2018). Using multidimensional time series data as indicators for the relapse defining issue, a test is applied in an office. The results gained show that the framework enables the collection, connection, and preservation of environmental data. The LSTM was computed using the collected data and evaluated against other AI systems. The computations are Supporting SVM Classifier, Guileless Bayes Structure, and Inter Perceptron Feed-Forward Organization. The results that followed parametric changes show that LSTM performs better in the indicated application. Recent technological breakthroughs like AI and deep learning have been significant in supplying fixes for issues in the clinical field.
The goal of segmenting images is to identify the exterior contour and voxel data associated with the region of interest. Segmenting organs or lesions is the primary use of division in vision in medicine. This makes it possible for the quantitative analysis of pertinent clinical factors and the provision of further information for follow-up diagnosis and therapy. Target separation, for instance, is essential for tumor radiation guiding and operative image navigating (Dawoud, 2011).
The approach is potentially for other places as well, considering its success with lung cancers (Tan et al., 2019). Our two suggested MRRNs perform better than current techniques. Recent research suggests that acute and chronic leukaemias are the two basic subtypes of leukaemia. Myeloid and lymphoid cells can be used to further split each kind. There are four distinct types of leukaemia. Based on the subtypes of the disease and their characteristics, numerous methods for leukaemia detection have been suggested.
The three separate CNN nodes that make up the suggested network design each have seven layers that are stacked and accept input of multi-scale nodules clusters (Wang et al., 2017). In order to determine when the patched center voxel is a part of the nodule, the next three CNN branches are combined with a layer that is entirely linked to 893 tumors from the general public’s LIDC-IDRI dataset, which included ground-truth comments and CT imaging data, to test the suggested methodology. We showed that the MV-CNN outperformed traditional image classification techniques, exhibiting promising achievement for splitting a variety of lump types, including juxta-pleural, cavitary, and non-solid nodules that were It achieved an average DSC of 77.67% and an average surface distance (ASD) of 0.24.
This research (Dutta, 2021) presents the Dense Recurrent Residual Convolutional Neural Network (Dense R2U CNN), which is a U-Net model architecture-based combination of Recurrent CNN, Residual Network, and Dense Convolutional Network. The dense recurrent layers improve signal transmission, which is necessary for categorization, and the remaining unit aids in learning a more advanced network. Compared to its similar models, the suggested model performed better on segmentation tests when tested on the benchmark Lung Lesion sample.
This research (Theresa & Bharathi, 2016) focused on reducing false positives while detecting lung nodules using transfer learning. The authors employed a CNN pre-trained on ImageNet and adapted it for the lung nodule detection task. The study demonstrated that transfer learning could aid in reducing false positives, a critical concern in medical image analysis. This study (Shin, Roberts, et al., 2016) proposed a novel approach to knowledge transfer in chest X-ray classification. The authors introduced the concept of “partially shared features,” where the model learns both shared and task-specific features during transfer learning. This approach improved classification accuracy by effectively utilizing both source and target domain information.
When it comes to lesion detection in medical imaging, lung segmentation is essential for both thorax extraction (which eliminates artifacts) and lung extraction. For the lung, the process of segmentation of different threshold approaches has been studied, including the threshold method, iterative threshold, Otsu threshold, and adaptive threshold (Peng et al., 2022). Numerous deep-learning techniques have been looked into and researched for the division of lungs. Using the LIDC-IDRI dataset, (Wang et al., 2017) created a multi-view CNN (MV-CNN) with an average DSC of 77.67% and an average ASD of 0.24 for lung nodule segmentation. In contrast to traditional CNN, MV-CNN combines several input images to identify lung nodules. MV-CNN finds it challenging to process 3D CT scans, nevertheless. To ensure that the volumetric patterns of malignant nodules, a 3D CNN was created (Hamidian et al., 2017). This research (Fernandes et al., 2017) aims to take out and evaluate the lesion/tumor from 2D and 3D radiological pictures. The primary clinical problem is to design an appropriate CAD system and Content-Supported Medical Image Retrieval (CSMIR) system. Consequently, it is imperative to create a computerized system that can identify, classify, and quantify lesions in the lungs and tumors. The ROI is subsequently identified using a unique unit identifier in the suggested study, which segments lung abnormalities employing the wavelet-based methodology. In contrast to the other approaches taken into consideration in this research, the test outcome validates that the suggested strategy delivers higher precision and sensitivity. Through the application of the LSTM Profound Learning computation, the information on the presence of individuals is inferred. Using multidimensional time series data as indicators for the relapse defining issue, a test is applied in an office (George et al., 2018). The results gained show that the framework enables the collection, connection, and preservation of environmental data. The LSTM was computed using the collected data and evaluated against other AI systems. The computations are supporting SVM Classifier, Guileless Bayes Structure, and Inter Perceptron Feed-Forward Organization. The outcome that followed parametric changes shows that LSTM performs better in the indicated application. Recent technological breakthroughs like AI and deep learning have been significant in providing solutions to problems in the clinical field.
For the lung, the process of segmentation of different threshold approaches has been studied, including the threshold method, iterative threshold, Otsu threshold, and adaptive threshold (Cha et al., 2019). The dense recurrent layers improve signal transmission, which is necessary for categorization, and the remaining unit aids in learning a more advanced network. Compared to its similar models, the suggested model performed better on segmentation tests when tested on the benchmark Lung Lesion sample.
First, we investigate and assess various CNN designs. The examined models range in number of layers from 5,000 to 160 billion variables (Shin, Roth, et al., 2016). Next, we assess how dataset size and spatial picture context affect efficiency. Lastly, we investigate the usefulness and limitations of transfer learning from pre-trained ImageNet (by adjusting). We investigate two particular computer-aided detection (CADe) tasks: the classification of interstitial lung disease (ILD) and the detection of lymph nodes (LNs) in the thoraco-abdominal region. We get cutting-edge findings in mediastinal LN identification and present the initial categorization using five times the cross-validation outcomes for axial CT slice prediction using ILD categories. Our thorough empirical assessment, CNN analysis of models and insightful conclusions can be used in the building of high-performing computer-aided design (CAD) systems for many uses in medical imaging.
Transfer knowledge has increased substantial attention within the realm of medical imaging due to its potential to enhance diagnostic accuracy and reduce the demand for extensive labeled data. Several studies and approaches have explored the application of transfer learning in this domain, aiming to leverage understanding from relevant tasks and domains to improve medical image analysis. Here, we delve into some notable studies and approaches that highlight the significance of transfer learning in medical imaging (Rajpurkar et al., 2017). This study (Jiang et al., 2018) demonstrated the power of transfer learning in medical imaging by training a convolutional neural network (CNN) called CheXNet to diagnose pneumonia from chest X-rays. The CNN was pre-trained on a massive dataset of natural images (ImageNet) and then fine-tuned on a more compact dataset of chest X-rays. This approach achieved radiologist-level performance, showcasing how transfer learning can mitigate the challenge of limited medical imaging data (Kuan et al., 2017). In this work (Lin et al., 2021), the authors used transfer learning to tackle the Kaggle Data Science Bowl 2017, which focused on lung cancer detection in computed tomography (CT) scans. They employed a pre-trained CNN (InceptionV3) on natural images and fine-tuned it on the lung cancer dataset. This study highlighted how transfer learning can provide a head start in learning relevant features for complex medical imaging tasks (Nogues et al., 2016). This study investigated the impact of transfer learning on a CAD system for lung nodule detection in CT scans. The authors pre-trained CNN models on ImageNet and then fine-tuned them on the lung nodule dataset. It was discovered that transferable knowledge significantly improved the detection performance, underscoring the benefits of leveraging generic features (Y. Wang et al., 2020). This research (Teramoto et al., 2020) focused on reducing false positives in the use of transfer learning techniques for lung lesion detection. The authors employed a CNN pre-trained on ImageNet and adapted it for the lung nodule detection task. The study demonstrated that transfer learning could aid in reducing false positives, a critical concern in medical image analysis. This study (Shin, Roberts, et al., 2016) proposed a novel approach to knowledge transfer in chest X-ray classification. The authors introduced the concept of “partially shared features,” where the model learns both shared and task-specific features during transfer learning. This approach improved classification accuracy by effectively utilizing both source and target domain information.
The three separate CNN nodes that make up the suggested network design each have seven layers that are stacked and accept multi-scale nodule patches as input (Zhu et al., 2018). In order to determine when the patched center voxel is a part of the nodule, the three CNN branches are then combined with a fully connected layer. 893 tumors from the general public’s LIDC-IDRI dataset, which included ground-truth comments and CT imaging data, were used to test the suggested methodology. We showed that the MV-CNN outperformed traditional image classification techniques, exhibiting promising achievement for splitting a variety of nodule types, including juxta-pleural, cavitary, and non-solid nodules that were It achieved an average dice similarity coefficient (DSC) of 77.67% and an average surface distance (ASD) of 0.24.
Through the application of the LSTM Profound Learning computation, the information on the presence of individuals is inferred. Using multidimensional time series data as indicators for the relapse defining issue, a test is applied in an office. The results gained show that the framework enables the collection, connection, and preservation of environmental data. The LSTM was computed using the collected data and evaluated against other AI systems (Schwyzer et al., 2018). The computations are supporting SVM Classifier, Guileless Bayes Structure, and Inter Perceptron Feed-Forward Organization. The results that followed parametric changes show that LSTM performs better in the indicated application. Recent technological breakthroughs like AI and deep learning have been significant in providing solutions to problems in the clinical field.
In this research (Zhang et al., 2021), the authors introduced an autonomous visual annotation model using recurrent neural cascades in chest X-rays. The model leveraged transfer learning by pre-training on a diverse image dataset before fine-tuning on chest X-rays. This demonstrated the adaptability of transfer learning techniques to tackle annotation challenges in medical imaging. This innovative approach (Lin et al., 2021) introduced a radial net framework that enabled effective transfer learning across different medical imaging modalities. The framework leveraged a shared feature space for multi-modal data, enabling knowledge transfer between modalities. The study highlighted the importance of addressing modality discrepancies in transfer learning. (Lin et al., 2021) A comprehensive survey highlighted the wide range of methods of transfer learning applied to medical evaluation of images. It covered approaches such as fine-tuning, feature extraction, and domain adaptation. The survey provided insights into the applications of transfer learning across various medical imaging tasks, including segmentation, classification, and detection. (Protonotarios et al., 2022) Explored self-supervised learning as a powerful approach for transfer learning in medical imaging. Self-supervised learning techniques, where models learn from unlabeled data, were applied to medical images, demonstrating their potential to leverage abundant unlabeled medical data for improved model initialization. (Kim et al., 2022) focused specifically on transfer learning in the biomedical image analysis domain, providing insights into the challenges, strategies, and successful applications. The study emphasized the importance of selecting appropriate source domains, dealing with domain shifts, and the potential benefits of domain adaptation techniques. For the lung, the process of segmentation of different threshold approaches has been studied, including the threshold method, iterative threshold, Otsu threshold, and adaptive threshold. The dense recurrent layers improve signal transmission, which is necessary for categorization, and the remaining unit aids in learning a more advanced network (Cha et al., 2019). Compared to its similar models, the suggested model performed better on segmentation tests when tested on the benchmark Lung Lesion sample.
This study (Zhong et al., 2019), through the application of the LSTM Profound Learning computation, the information on the presence of individuals. Using multidimensional time series data as indicators for the relapse defining issue, a test is applied in an office. The results gained show that the framework enables the collection, connection, and preservation of environmental data. The LSTM was computed using the collected data and evaluated against other AI systems. The computations are supporting SVM Classifier, Guileless Bayes Structure, and Inter Perceptron Feed-Forward Organization. The results that followed parametric changes show that LSTM performs better in the indicated application. Recent technological breakthroughs like AI and deep learning have been significant in offering fixes for issues in the clinical field.
We assess in this study (Franck et al., 2021) how dataset size and spatial picture context affect efficiency. Lastly, we investigate the usefulness and limitations of transfer learning from pre-trained ImageNet (by adjusting). We investigate two particular computer-aided detection (CADe) tasks: the classification of interstitial lung disease (ILD) and the detection of lymph nodes (LNs) in the thoraco-abdominal region. We get cutting-edge findings in mediastinal LN identification and present the first five-fold cross-validation classification outcomes for axial CT slice prediction using ILD categories. Our thorough empirical assessment, CNN analysis of models and insightful conclusions can be used in the building of high-performing computer-aided design (CAD) systems for many uses in medical imaging.
Table 2.1: Summary of related work on lung disease detection
Research gap/ deficiency
The research gap in lung disease detection and segmentation using deep learning techniques is characterized by several notable challenges that need to be addressed to advance the field. Firstly, there exists a critical need to improve the generalizability and robustness of deep learning models across diverse patient populations and imaging modalities. Research gaps still exist in automated lung X-ray image detection and segmentation. These include training procedures that require a lot of resources, high computational requirements, and limited generalizability. To reduce diagnostic errors, improved methods are required to handle pathological features. Models should provide high detection accuracy but not segmentation precision. Creating algorithms that are resilient, scalable, and flexible is essential to accurately diagnosing lung diseases.
Methodology
Dataset
The dataset for lung disease detection encompasses a comprehensive repository of medical images, specifically curated to address the identification of two distinct classes: fibrosis and consolidation. This dataset is structured into two distinct classes, with 2,676 instances representing fibrosis and 3,784 instances depicting cases of consolidation, as shown in Figures 3.1 and 3.2. This dataset provides a well-balanced and substantial collection for the training as well as the assessment of deep learning models. Each image in the dataset is meticulously labeled to indicate the presence of either fibrosis or consolidation, offering a valuable resource for scholars and experts in the domain of artificial intelligence for medical imaging. The thesis looked at two different kinds of lung images: X-rays and CT scans of the lungs. The inclusion of data from both CT and X-ray images adds a crucial dimension to the dataset, acknowledging the varied diagnostic technologies used in clinical settings.
The inclusion of these two classes reflects the diversity of pathological conditions affecting the lungs, enabling the development of models capable of discerning nuanced patterns associated with fibrosis and consolidation. This dataset serves as a foundational tool for advancing the accuracy and effectiveness of lung disease detection models, facilitating the creation of robust algorithms that can contribute to earlier diagnosis and more targeted interventions for patients affected by these respiratory conditions.
CT image (b) X-ray image
Figure 3.1: Fibrosis images
CT image (b) X-ray image
Figure 3.2: Consolidation images
Types of diseases
The deep learning methods used in this study to detect fibrosis and consolidation lung diseases are as follows:
Fibrosis
Fibrosis, a pathological condition characterized by the formation of excess fibrous connective tissue in the lungs, poses a significant challenge in diagnostic imaging. Leveraging CNN methods for the analysis of CT and X-ray images has emerged as a promising approach in detecting and classifying fibrosis. CT scans, with their high-resolution cross-sectional images, provide detailed insights into the lung parenchyma and are instrumental in visualizing the extent and distribution of fibrotic tissue. CNNs, with their ability to automatically learn hierarchical features, excel in recognizing subtle patterns indicative of fibrosis in CT images. Additionally, X-ray imaging, a widely accessible diagnostic tool, plays a crucial role in the initial assessment of fibrotic lung changes. Applying CNNs to X-ray images allows for a rapid and cost-effective screening method, aiding in the early detection of fibrosis.
Consolidation
Consolidation is a lung pathology characterized by the accumulation of fluid or inflammatory exudates in the airspaces, necessitating accurate and timely detection for effective patient management. Utilizing CNN methods for the analysis of CT and X-ray images has emerged as a promising avenue in consolidative lung disease diagnosis. CT imaging, with its detailed cross-sectional views, offers a comprehensive assessment of lung parenchyma, allowing CNNs to automatically learn and discern the subtle patterns indicative of consolidation. The high-resolution capabilities of CT scans enhance the precision of CNN-based detection algorithms. Additionally, the application of CNNs to X-ray images, a widely used and accessible diagnostic modality, provides a rapid and cost-effective means of screening for consolidative lung changes.
Data Preprocessing
Preprocessing is the first phase of this work. The goal is to improve image quality, which will lead to improved categorization and segmentation performance. Data preprocessing is an essential phase in the machine learning process pipeline, encompassing tasks such as image resizing, normalization, data augmentation, background removal, and histogram equalization, handling missing data, label encoding and data splitting to enhance its suitability for analysis. This process holds significant importance in any machine learning project because the quality and structure of the data directly influence the accuracy and effectiveness of the model being developed.
Initially, the raw images, which may be obtained from various sources such as CT scans or X-ray machines, undergo normalization to standardize pixel intensities and ensure consistent data representation. Subsequently, noise reduction techniques are often applied to mitigate artifacts or irregularities that may be present in the images, ensuring a cleaner input for the deep learning model. Image resizing and cropping are commonly employed to standardize dimensions and focus on relevant regions of interest within the lung area. Data balancing methods are applied to address any class imbalances that may exist within the dataset, ensuring that the model is exposed to an equitable representation of each target class.
Preprocessing reduces noise and unnecessary complexity in the dataset, enabling machine learning algorithms to focus on important patterns. This efficient utilization of resources enhances model training and prediction speed. Preprocessing enhances the model’s ability to learn meaningful patterns and connections from the data, ultimately leading to more accurate and reliable lung disease detection, as shown in Figure 3.3.
Original image (b) Preprocessed image
Figure 3.3: Preprocessing
Image resizing
The dataset is separated into training and validation sets for effective model training and evaluation. Originally, the images were of a much larger size, measuring 240×320 pixels. To make them more suitable for image classification tasks, all the images have been uniformly resized to a common size of 224×224 pixels. This resizing is a standard practice in image classification tasks as it facilitates efficient computation and training of Convolutional Neural Network (CNN) models. Resizing the images to a smaller dimension serves several important purposes. Firstly, it reduces the computational demands associated with processing the images. Smaller images require fewer parameters, which makes training CNN models more manageable and efficient.
Normalization
The normalization of lung disease detection using Convolutional Neural Networks (CNNs) involves a systematic process to optimize the model’s performance in accurately identifying and detecting diseases from fibrosis and consolidation images. Manual adjustments are made to verify that the images match the necessary target_size = (224, 224) pixels. This consistency in image proportions creates a continuous framework for the model to operate inside, regardless of the original image sizes. By dividing each pixel value by 255, the pixel values a transformed into a normalized range that spans from 0 to 1. A shown in Equation .1, the output i normalised to create image pixel intensities cross 0 and 255.
Data augmentation
Data augmentation is a pivotal aspect of optimizing the Convolutional Neural Network (CNN) for lung disease detection. This technique involves generating diverse Samples of training with the use of different modifications original leaf images. Data augmentation techniques, such as random rotations, flips, and zooms, are employed in order to broaden the training dataset’s variety, reducing the risk of overfitting and enabling the CNN to generalize better to unseen data. Zooming involves resizing an image such that some parts of the image are magnified and others are shrunk. This technique can help create variations of the original images and improve the model’s ability to recognize objects at different scales. Horizontal flipping involves flipping the image horizontally to create a mirrored version of the original image. Some tasks included in data augmentation are as follows:
Rotation: The image is rotated by a certain angle to simulate different orientations of signboards. This helps the model recognize lung disease independent of their orientation in the input image.
Scaling: The image is resized to be larger or smaller, simulating variations in the size of signboards. This allows the model to detect and recognize lung disease of different dimensions.
Flipping: The image is flipped horizontally or vertically, creating mirror images. This helps the model become invariant to left-right or up-down symmetries of lung disease.
Brightness and Contrast Adjustment: The brightness and contrast of the image are modified to imitate changes in lighting conditions. This helps the model adapt to different lighting environments.
Data splitting
Data splitting is a fundamental step in training and evaluating a Convolutional Neural Network (CNN) for lung disease detection. This process includes splitting the dataset into distinct subsets, typically training, validation, and testing sets. The images are divided into test and train sets using slicing. Setting the percentage of images to use for testing here, the variable test percentage is set to 0.2, which means that 20% of the images will be used for testing, while the remaining 80% will be used for training. A flowchart of proposed work is shown in Figure 3.4.
Figure 3.4: Flowchart of Swin Architecture
Feature Extraction
Feature extraction is a technique utilized in image analysis to organize images into smaller sections for subsequent processing. In our research, we identify A significant number of traits that support the finding and recognizing patterns across a large number of datasets. The process involves utilizing the convolutional layers of the CNN to automatically learn and extract hierarchical features from the input images. These features may include color variations, texture patterns, and spatial relationships within the skin lesions. The ability of CNNs to automatically extract relevant features contributes to their effectiveness in melanoma detection.
Convolutional neural network (CNN)
Convolutional Neural Networks (CNNs) have emerged as powerful tools in the domain of lung disease detection (fibrosis and consolidation), demonstrating exceptional capabilities in automated image recognition tasks. CNNs are employed to discern intricate patterns and features within lung images, enabling the identification of various diseases such as fibrosis and consolidation. CNN architecture includes an input layer, a Convolutional layer, a RELU, a pooling layer and a fully connected layer.
Input layer
The input layer is the first layer of the CNN, and it takes the image as input. The input layer provides a set of images as input to the CNN. [Height*width*number of Color channels] represents the input image. The color channel denotes the sort of image; for example, channel=3 denotes an RGB image. The same information is run via a data argumentation before being fed into the CNN.
Convolutional layer
The Convolutional Layer plays a vital function in automatically extracting relevant features from lung images, enabling the CNN to discern patterns associated with different diseases. This layer utilizes filters or kernels that convolve across the input image, performing local receptive field operations to capture distinct features such as textures, edges, and color variations. The convolutional layer takes as input the lung images obtained from datasets. These images serve as the raw input on which the CNN will perform convolution operations to identify relevant patterns and features associated with lung disease. The output of the convolution operation is a set of feature maps. Each component of the map correlates with a particular filter and serves as the area of the input image where the filter detects relevant features. If a two-dimensional image is being processed, this component will carry out a multiplication operation between the raw data and the two-dimensional matrix of weights. The convolutional layer is responsible for the extraction of low-level properties, including edges, color, the direction of gradients, and so on. Additional layers are added to have the high-level elements that make up skin texture and are used to swiftly determine the kind of lung disease.
RELU
The RELU layer is also known as the activation layer. Because it performs non-linearity in the program, the RELU layer is executed after every layer of convolution. The ReLU activation function operates by replacing any negative input values with zero, as shown in Equation 3.3. ReLU improves the model’s ability to identify critical properties in lung images that are suggestive of specific diseases, such as fibrosis and consolidation, by increasing the selective activation of neurons.
Pooling layer
The Pooling Layer helps the CNN become more invariant to variations in scale, orientation, and position of disease-related features. By retaining the most prominent features and discarding redundant information, this layer contributes to the network’s ability to focus on the most critical aspects of the input images, such as distinctive patterns or irregularities associated with different diseases. The pooling layer is used to reduce the spatial dimensions.
Flatten layer
The flattened layer is used between the convolutional and fully connected layers to turn two-dimensional data into one feature vector. Flatten layer facilitates the training process. During training, the weights associated with the connections between the flattened layer and the subsequent dense layers are adjusted iteratively, allowing the model to learn discriminative patterns indicative of specific diseases. As CNNs are designed to process multidimensional data, such as medical images like CT scans or X-rays in the case of lung cancer detection, the convolutional layers extract spatial features hierarchically. These features are organized in a multidimensional tensor. The flattened layer then plays a crucial role in reshaping this multidimensional representation into a one-dimensional vector.
Fully connected layer
Following the extraction of features by convolutional and pooling layers, the Fully Connected Layer processes this information to make predictions about the presence of specific diseases. This layer connects every neuron to every neuron in the preceding layer, effectively flattening the hierarchical representations learned in the convolutional layers. The fully connected layer accepts feature vectors as input and uses them to classify input images using the function of softmax. In lung cancer disease detection, the fully connected layer plays a crucial role within the Convolutional Neural Network (CNN) architecture. Also known as the dense layer, it is typically positioned towards the end of the network, following the convolutional and pooling layers that extract hierarchical features from the input data, which in this context are often medical images such as CT scans or X-rays.
Output layer
The output layer serves as the final stage of the network, where the learned features are translated into predictions for specific disease classes. The architecture typically culminates in a dense layer with a softmax activation function, aligning with the nature of multi-class classification tasks associated with identifying different types of lung diseases. Equation 3.4 is used to calculate the feature map’s output size, where N is the original input size, F is the kernel size, P is the padding, and S is the stride.
CNN architecture
In this study, we provide a CNN architecture based on CNN for the detection of lung diseases. The simulation parameters of the proposed CNN architecture are shown in Table 3.1. A convolutional neural network (CNN) is used with TensorFlow and Keras for image classification. The model’s purpose is to handle input images of size 224×224 pixels with three color channels (RGB). The input layer is defined with a shape of (224, 224, 3), indicating the expected dimensions of the input images. Data augmentation is typically used to increase the diversity of the training dataset by applying transformations such as rotation, flipping, and zooming to the input images.
The next includes a rescaling layer, which scales the pixel values of the input images to a range between 0 and 1. This is a typical stage of preprocessing to normalize the input data. The model architecture consists of multiple sets of convolutional layers, each followed by batch normalization, max-pooling, and activation functions. The convolutional layers (Conv2D) perform feature extraction by applying filters to the input images. The ReLU activation function introduces non-linearity to the model, and batch normalization helps stabilize and accelerate the training process.
After each convolutional block, a max-pooling layer is applied to reduce spatial dimensions and retain essential features. This downsampling helps the model focus on the most important information while discarding less relevant details.
The last part of the architecture includes a flattened layer to reshape the 2D feature maps into a 1D vector, followed by a dropout layer with a dropout rate of 0.5. Dropout is employed to prevent overfitting through the haphazard deactivation of certain neurons during training. The final layer is a dense layer with two output nodes, indicating a binary classification task. The softmax activation function is applied to produce probability scores for each class, and the model is compiled using an appropriate loss function, optimizer, and evaluation metrics.
Table 3.1: Parameter details of the CNN architecture
Parameters | Values |
Convolutional layer | 6 (3*3) |
Pooling layer | 3 (3*3) |
Optimizer | Root Mean Squared Propagation (rmsprop) |
Epochs | 10 |
Batch size | 32 |
Learning rate | 0.001 |
ResNet-50
Integrating ResNet-50 into a sequential model for lung disease detection, with a focus on distinguishing between fibrosis and consolidation, proves to be a potent strategy. This study constructs a deep learning model for lung disease identification by means of a pre-trained ResNet50 convolutional neural network (CNN) as a feature extractor. The model architecture follows a transfer learning approach, utilizing the understanding obtained from training on the ImageNet dataset for a different task.
The initial part initializes the ResNet50 model with pre-trained weights on the ImageNet dataset. The include_top=False argument excludes the fully connected layers at the top of the network, and the input_shape=(224, 224, 3) specifies the expected shape of the input images, which are assumed to be 224×224 pixels with three color channels (RGB).
Following the initialization of ResNet50, a Keras Sequential model is created. ResNet50 is added as the second layer in the sequential model, serving as a feature extractor. ResNet50 is known for its deep architecture with residual connections, allowing it to capture hierarchical features in images effectively. In the context of lung disease detection, these features might include patterns and abnormalities associated with various lung conditions.
A dropout layer with a dropout rate of 0.25 is introduced after the ResNet50 layer. Dropout is a regularization technique that randomly deactivates a fraction of neurons during training, preventing overfitting and enhancing the model’s ability to generalize to new data. The model architecture is further extended with dense layers. The feature maps extracted by ResNet50 are flattened, and two dense layers with rectified linear unit (ReLU) activation functions are added. These dense layers act as classifiers, transforming the extracted features into a representation suitable for binary classification.
Another dropout layer is included after the first dense layer to continue mitigating overfitting. The final dense layer comprises two output nodes with a softmax activation function, indicating that the model is designed for binary classification. Specifically, it aims to predict whether an input image represents a healthy or diseased lung. The model has been assembled using the Rmsprop optimizer, binary cross entropy loss function (appropriate for binary classification), and accuracy as the evaluation metric. This setting prepares the model for training on a labeled dataset of lung images to learn and generalize patterns indicative of different lung conditions. The ResNet50-based model has a total of 8 layers, as shown in Table 3.2. This count includes the layers for data augmentation, the ResNet50 convolutional base, dropout layers, flatten layers, and dense layers for feature transformation and classification.
Table 3.2: Parameter details of ResNet_50
Parameters | Values |
Layers | 8 |
Optimizer | Root Mean Squared Propagation (rmsprop) |
Epochs | 10 |
Batch size | 32 |
Learning rate | 0.001 |
MobileNet
The inclusion of MobileNet in the neural network model for lung disease detection serves specific purposes in the context of deep learning for problems involving image categorization. MobileNet is chosen for its efficiency, lightweight architecture, and the ability to strike a balance between accuracy and computational cost. The model incorporates MobileNet as its convolutional base, and various layers are added to the model for feature extraction, transformation, and classification.
The first layer added to the model is the data augmentation layer, denoted by data_augmentation. This layer is crucial for enhancing the model’s ability to generalize by using random transformations to the input images during training. Data augmentation helps diversify the training dataset, reducing overfitting and improving the resilience of the model to changes in the source data.
The second layer, ‘conv_base’, represents the MobileNet convolutional base. MobileNet is specifically designed for efficient and lightweight deep learning applications, making it suitable for resource-constrained environments like mobile and embedded devices. The convolutional base is typically pre-trained on large datasets, such as ImageNet, and is capable of extracting hierarchical features from images. A dropout layer with a dropout rate of 0.25 is introduced after the MobileNet convolutional base. Dropout is a regularization method that eliminates at randomly eliminates a fraction of neurons during training, preventing the model from relying too heavily on specific features and promoting better generalization.
Following the dropout layer, the model is flattened using layers. Flatten (). This operation transforms the output starting with the multilayer base into a one-dimensional vector, preparing it for input into the subsequent dense layers. Two dense (fully connected) layers with Rectified Linear Unit (ReLU) activation functions are then added to the model. These layers, with 256 and 128 units, respectively, perform feature transformation and abstraction on the flattened output. ReLU activation introduces non-linearity to the model, enabling it to discover intricate connections between the information.
Another dropout layer is introduced after the first dense layer with a dropout rate of 0.25. This additional dropout layer further helps prevent overfitting during the training process. The final dense layer has two output nodes with a softmax activation function (layers.Dense (2, activation=’softmax’)). This indicates that the model is configured for binary classification, predicting the probability of an input belonging to one of two classes. In the context of lung disease detection, these classes could represent healthy and diseased lungs. Table 3.3 summarizes the parameter details of MobileNet, which describes layers, epochs, batch size, optimizer, and learning rate.
Table 3.3: Parameter details of MobileNet
Parameters | Values |
Layers | 8 |
Optimizer | Root Mean Squared Propagation (rmsprop) |
Epochs | 10 |
Batch size | 32 |
Learning rate | 0.001 |
Inception V3
InceptionV3 is an efficient deep convolutional neural network architecture for image classification applications, has been demonstrated. It is characterized by the use of inception modules, which employ multiple convolutional filters of different sizes in parallel to capture features at various scales. The model is often pre-trained on large datasets such as ImageNet and can be fine-tuned for specific tasks like lung disease detection.
The InceptionV3 model is added as the convolutional base. It consists of multiple inception modules, each capturing features at different levels of abstraction. The pre-trained weights of InceptionV3 enable the model to recognize intricate patterns and structures within the lung images. A dropout layer with a dropout rate of 0.25 is introduced after the convolutional base to prevent overfitting. This layer randomly deactivates a proportion of neurons during instruction, enabling improved generalization. The flatten layer is included to transform the output from the InceptionV3 convolutional base into a one-dimensional vector, preparing it for input into the subsequent dense layers.
Two dense (fully connected) layers with Rectified Linear Unit (ReLU) activation functions are added. These layers perform feature transformation and abstraction on the flattened output, capturing high-level representations relevant to lung disease detection. Another dropout layer is introduced after the first dense layer with a dropout rate of 0.25 to further contribute to regularization during training. The final dense layer has two output nodes with a softmax activation function, indicating that the model is configured for binary categorization. It predicts the probability of an input image falling into one of two categories, which could represent healthy or diseased lungs. Table 3.4 summarizes the parameter details of MobileNet, which describes layers, epochs, batch size, optimizer, and learning rate. It uses the Rmsprop optimizer for adaptive learning rate adjustments and binary tasks for classification using the binary cross entropy loss function. The model is designed to learn and discern lung patterns, contributing to the accurate classification of lung diseases.
Table 3.4: Parameters details of InceptionV3
Parameters | Values |
Layers | 8 |
Optimizer | Root Mean Squared Propagation (rmsprop) |
Epochs | 10 |
Batch size | 32 |
Learning rate | 0.001 |
Swin transformer
The adoption of Swin Transformer in the context of lung disease detection, particularly for identifying fibrosis and consolidation, is motivated by its unique features and capabilities that align with the intricate characteristics of medical images associated with these conditions. Swin Transformer, known for its effectiveness in capturing long-range dependencies and hierarchical features, provides an ideal solution for discerning these intricate patterns within the lung images.
The inclusion of random cropping and horizontal flipping through data augmentation techniques enhances the model’s ability to generalize effectively. Lung images can exhibit variations in terms of orientation, position, and other factors, and data augmentation facilitates the model’s learning of robust, consistent characteristics to such variations.
The Swin Transformer is a transformer-based architecture that excels at capturing extended connections in images. Its self-attention mechanisms, multi-head attention, and hierarchical processing make it well-suited for image classification tasks. The use of Swin Transformers allows the model to discern intricate patterns and features within lung images, contributing to the detection of subtle abnormalities associated with lung diseases.
Patch-Based Processing: The Patch Extract and Patch Embedding layers divide the input images into patches and embed them into a lower-dimensional space. This patch-based processing enables the model to capture both local and global dependencies in lung images efficiently. This approach is particularly useful when dealing with medical images, where specific regions of interest may contain critical diagnostic information.
Swin Transformer: The Swin Transformer is a transformer-based architecture that excels at capturing extended connections in images. Its self-attention mechanisms, multi-head attention, and hierarchical processing make it well-suited for image classification tasks. The use of Swin Transformers allows the model to discern intricate patterns and features within lung images, contributing to the detection of subtle abnormalities associated with lung diseases.
Global Average Pooling: The Global Average Pooling 1D layer aggregates information from the final set of patches into a one-dimensional vector. This operation reduces the spatial dimensions to a single value per feature, preparing the data for the final classification layer. Global average pooling is a common practice to condense information while retaining important features.
Dense Output Layer: The model concludes with a dense layer having two output nodes and a softmax activation function, indicating that it is configured for binary classification. This layer predicts the probability distribution of the input belonging to one of two classes, likely representing healthy or diseased lungs.
Table 3.5 shows that architecture begins with an input layer, accepting images of a specified input shape. Subsequently, data augmentation techniques, including random cropping and horizontal flipping, are applied to enhance model robustness. The PatchExtract and Patch Embedding layers divide the input image into patches and embed them into a lower-dimensional space. Two Swin Transformer blocks follow, each representing a layer of the Swin Transformer model, which excels at capturing long-range dependencies and hierarchies in images. The Patch Merging layer combines neighboring patches, facilitating interactions between adjacent regions. The GlobalAveragePooling1D layer aggregates information from the final set of patches into a one-dimensional vector, and the model concludes with a dense layer having two output nodes and a softmax activation function for binary classification.
Table 3.5: Parameter details of Swin_transformer
Parameters | Values |
Layers | 10 |
Optimizer | Adam |
Epochs | 10 |
Batch size | 32 |
Learning rate | 0.001 |
Proposed Vision_transformer with self-attention
The proposed methodology for lung disease detection, focusing on the identification of fibrosis and consolidation, introduces an innovative approach leveraging transformer-based models with self-attention mechanisms. In this methodology, the input lung images undergo a series of preprocessing steps, including random cropping and flipping to augment the dataset and enhance the model’s ability to generalize across diverse clinical scenarios. The unique aspect of the methodology lies. The lung images are initially processed through a PatchExtract and Patch Embedding layer, breaking them down into patches and embedding them into a lower-dimensional space. Two instances of the Swin Transformer, each representing a distinct block of the model, follow suit. The Swin Transformer’s self-attention mechanisms are crucial in enabling the model to analyze relationships between distant regions within the lung images, providing the capability to discern subtle patterns associated with fibrosis and consolidation. The proposed methodology incorporates patch-based processing, allowing the model to focus on local features while maintaining the global context, a crucial aspect in the nuanced detection of lung abnormalities.
The proposed model for detecting lung diseases is based on a Vision Transformer (ViT) architecture, which has shown significant success in image classification tasks. The model begins by taking input images of a specified shape and applies data augmentation to enhance its generalization capabilities. Following augmentation, the images undergo a patching process, breaking them down into patches using a Patches layer. These patches are then encoded through a Patch Encoder, which projects them into a lower-dimensional space. The process of patch extraction for lung disease detection involves resizing an image to 224×224 pixels and extracting 256 patches, each containing 588 elements, as shown in Figure 3.5. This technique is commonly used in image-based tasks like lung disease detection, as it captures local patterns and features, enhancing the detection model’s sensitivity to subtle lung disease patterns.
Figure 3.5: ViT with Self-Attention
The key idea of the model is multiple Transformer blocks, each comprising a series of operations. Within each block, Layer Normalization is applied first, followed by a multi-head self-attention mechanism, allowing the model to capture intricate dependencies between patches, as shown in Figure 3.6. A skip connection is employed to preserve important information from the original encoded patches. The block continues with layer normalization, followed by a Multi-Layer Perceptron (MLP) for capturing non-linear features. Another skip connection integrates the MLP’s output with the previous attention output.
This process is repeated for the specified number of Transformer layers, enhancing the model’s capacity to both local and global features in the lung images. After the Transformer blocks, the encoded patches undergo additional layer normalization, flattening, and dropout before entering an MLP head for further feature extraction. The final classification layer is made up of a dense layer with a softmax activation function, producing probabilities for binary classification (disease or healthy lungs).
The learning rate, weight decay, batch size, and number of epochs are crucial hyperparameters that impact the training dynamics and generalization of the model. The choice of image size and patch size determines the granularity of information the model processes. The projection dimension, number of heads and transformer units influence the model’s capacity for capturing spatial relationships and patterns. The transformer layers and MLP head units define the depth and complexity of the model architecture, as shown in Table 3.6. The input shape represents the dimensions of the input images. These parameters collectively shape the behavior and performance of the proposed ViT-based lung disease detection model. Adjusting these values based on the dataset characteristics and task requirements is essential for achieving optimal results.
Figure 3.6: ViT with Self-Attention
Table 3.6: Parameter details of Vision_transformer
Parameters | values |
Learning rate | 0.001 |
Weight decay | 0.0001 |
Batch size | 32 |
Num epochs | 10 |
Image size | 224 |
Patch size | 14 |
Num patches | calculated |
Projection dimension | 64 |
Num heads | 4 |
Transformer units | [128, 64] |
Transformer layers | 8 |
MLP head units | [2048, 1024] |
Input shape | (224, 224,3) |
The above table describes that the model is trained for 10 epochs using a learning rate of 0.001 and a weight_decay of 0.0001 to prevent overfitting. The batch size is 32, and images are resized to 224×224 pixels and extracted into a grid of 14×14 patches. The transformer_layers parameter specifies the number of transformer blocks, while the num_heads parameter determines the number of attention heads in the multi-head attention layer. The input shape is specified as (224, 224, and 3).
Evaluation matrices
The use of evaluation metrics in lung disease detection is paramount for systematically assessing the performance and reliability of detection models. These metrics provide quantitative measures that offer insights into how well a model is functioning in identifying instances of lung disease from medical imaging data.
Equation 3.5 shows that the various performance indicators, such as accuracy, precision, recall, confusion matrix, and F1 score, are used to validate the expected output of lung disease detection.
Where:
Accuracy is the fraction of correctly classified instances in the test set.
True positives (TP) are instances that are actually positive in the test set and are correctly labeled as positive by the classifier. True negatives (TN) are instances that are actually negative in the test set and are correctly labeled as negative by the classifier. False positives (FP) are instances that are actually negative in the test set but are incorrectly labeled as positive by the classifier. False negatives (FN) are instances that are actually positive in the test set but are incorrectly labeled as negative by the classifier.
As a result, the positive or healthy mango leaf samples are P, whereas the negative or unhealthy mango leaf samples are N, as shown in equations 3.6 and 3.7. The specificity and sensitivity formulas are as follows:
Precision is the fraction of true positive instances among all instances predicted as positive by the classifier, as shown in Equation 3.8. Precision is estimated by dividing the actual number of successes by the classifier’s projected number of successes.
Recall is the fraction of true positive instances among all instances that are actually positive in the test set, as shown in Equation 3.9.
Equation 3.10 shows that the F1 score is the harmonic mean of precision and recall, providing a combined measure of precision and recall.
Results And Discussion
Introduction
The results of the assessment of the suggested method for fibrosis and consolidation diseases are shown in this section. In this section, the efficiency of the proposed lung disease detection framework is validated using several types of evaluation criteria and comparisons with other models. TensorFlow and Python programming are used for the implementation of the model described in the present research. The complete experimental setup was done in Python using Google Colab. Keras libraries are used to build, compile, and test the model.
Training and testing set
The dataset for lung disease detection comprises a total of 6,460 images, categorized into two distinct classes: fibrosis and consolidation. In the context of model development, 4,815 images are allocated for training, providing a diverse and representative set for the deep learning algorithm to learn patterns associated with both fibrosis and consolidation. This training set forms the foundation for the model to acquire the necessary knowledge and optimize its parameters. Subsequently, a separate testing set of 1,645 images is reserved for the assessment and validation of the trained model’s performance. This independent set ensures an unbiased evaluation of the model’s generalization capabilities, simulating real-world scenarios where the algorithm encounters unseen data. The division into training and testing sets, with a clear demarcation of images for each class, allows for a comprehensive evaluation of the algorithm’s accuracy, sensitivity, specificity, and overall efficacy in accurately identifying instances of fibrosis and consolidation in lung disease detection.
Performance measurement
In this study, the performance of CNN architecture, Inception V3, ResNet50, MobileNet, Swin Transformer and a proposed method, Vision_transformer, was examined, each assessed for its accuracy in distinguishing between different diseases affecting lungs. The comparison demonstrates that the suggested approach is more accurate.
CNN architecture
The results of the CNN architecture in lung disease detection demonstrate its effectiveness in advancing the state-of-the-art in medical image analysis. The CNN architecture’s success in accurately detecting lung diseases underscores its potential as a valuable tool for early diagnosis and intervention, contributing to improved patient outcomes in the realm of respiratory health.
During the training phase, the model refines its parameters to accurately classify instances within the dataset containing fibrosis and consolidation images. The training accuracy metric gauges the model’s proficiency in correctly identifying these instances, as shown in Figure 4.1. The validation phase assesses the model’s ability to generalize to new, unseen data, and the validation accuracy indicates its performance on independent samples of fibrosis and consolidation cases.
Figure 4.1: Training and validation accuracy of CNN architecture
The validation loss measures the disparity between predicted and actual values during validation, as shown in Figure 4.2. High training accuracy and low training and validation losses indicate effective learning and generalization, showcasing the CNN architecture’s capability to discern and classify patterns associated with lung diseases accurately.
Figure 4.2: Training and validation loss of CNN architecture
The confusion matrix is used for the evaluation of the classifying accuracy of the model, as shown in Figure 4.3. The confusion matrix for the CNN architecture in lung disease detection, specifically focusing on fibrosis and consolidation, provides a detailed breakdown of the model’s classification performance for these two classes. The matrix comprises four key components: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). In the context of fibrosis and consolidation detection, true positives represent instances where the model correctly identifies images containing fibrosis or consolidation. True negatives indicate accurate identifications of images without fibrosis or consolidation. False positives occur when the model wrongly predicts the presence of fibrosis or consolidation, and false negatives represent cases where the model fails to identify actual instances of fibrosis or consolidation.
Figure 4.3: Confusion matrix of CNN architecture
The evaluation metrics precision, recall, and F1 score are crucial for assessing the CNN architecture’s performance in lung disease detection, specifically targeting fibrosis and consolidation. Precision measures the model’s accuracy in correctly identifying positive instances, minimizing false positives. Recall, also known as sensitivity, assesses the model’s ability to correctly identify all relevant instances, minimizing false negatives. The F1 score provides a balance between precision and recall, offering a single metric that considers both false positives and false negatives. In the context of fibrosis and consolidation detection, these metrics collectively offer a comprehensive understanding of the proposed CNN architecture’s ability to accurately classify instances of these specific lung diseases. High precision indicates a low rate of misclassifying non-diseased instances as positive, high recall reflects effective identification of actual disease cases, and a high F1 score signifies a balanced performance between precision and recall. These metrics guide the refinement of the CNN architecture for improved diagnostic accuracy in fibrosis and consolidation detection.
Table 4.1 shows the results, which summarize the accuracy, precision, and recall generated from the confusion matrix obtained by the proposed CNN model.
Table 4.1: Performance metrics of CNN architecture
Precision | Recall | F1-score | Support |
Accuracy |
|
Fibrosis | 0.99 | 0.98 | 0.99 | 370 | 0.99 |
Consolidation | 0.99 | 0.99 | 0.99 | 450 |
ResNet-50
Regardless, the inherent strengths of ResNet-50, particularly its deep residual learning architecture designed to capture intricate features in medical images, there may be instances where the model struggles to achieve the desired accuracy in discerning respiratory conditions. ResNet-50 is designed to capture intricate features associated with respiratory conditions. During the training phase, the model learns from a dataset comprising images of lung diseases, optimizing its parameters to enhance accuracy. The training accuracy metric reflects the model’s proficiency in correctly classifying instances within the training dataset, as shown in Figure 4.4. The validation phase assesses ResNet-50’s ability to generalize to new, unseen data, with the validation accuracy indicating its performance on independent samples.
Figure 4.4: Training and validation accuracy of ResNet-50
The training loss quantifies the optimization process. The validation loss measures the disparity between predicted and actual values during validation. The training and validation loss of ResNet-50 is shown in Figure 4.5.
Figure 4.5: Training and validation loss of ResNet-50
Table 4.2 shows that the ResNet50 model effectively distinguishes between two classes (0 and 1) in a dataset. Its precision for class 0 is perfect (0.97), but recall is 0.88. For class 1, recall is perfect (0.97), but precision is slightly lower (0.91). The model’s overall accuracy is 0.94, with macro and weighted averages providing insights into performance across both classes.
Table 4.2: Performance metrics of ResNet-50
Precision | Recall | F1-score | Support |
Accuracy |
|
Fibrosis | 0.97 | 0.88 | 0.93 | 370 | 0.94 |
Consolidation | 0.91 | 0.97 | 0.95 | 450 |
The study evaluates ResNet-50’s ability to detect lung diseases, specifically fibrosis and consolidation, using a confusion matrix. True Positives (TP) indicate accurate predictions of lung disease, True Negatives (TN) indicate non-disease cases, and False Positives (FP) occur when the model incorrectly predicts disease. This analysis provides an insight into ResNet-50’s performance in lung disease diagnosis, as shown in Figure 4.6.
Figure 4.6: Confusion matrix of ResNet-50
Inception V3
In the evaluation of lung disease detection, the results obtained from Inception-V3 may not demonstrate superior performance when compared to other CNN architectures. Despite its unique architecture and multi-scale convolutional operations, Inception-V3 may face challenges in certain scenarios, exhibiting limitations in accurately capturing nuanced features associated with respiratory conditions. During the training phase, Inception-V3 learns from a dataset comprising images of lung diseases, optimizing its parameters to improve accuracy. The training accuracy metric signifies the model’s proficiency in correctly classifying instances within the training dataset. The validation phase assesses the model’s ability to generalize to new, unseen data, as indicated by the validation accuracy, as shown in Figure 4.7.
Figure 4.7: Training and validation accuracy of Inception-V3
Figure 4.8 shows that the validation loss measures the difference between predicted and actual values during validation. High training accuracy and low training and validation losses are indicative of effective learning, suggesting Inception-V3’s capacity to discern lung disease patterns.
Figure 4.8: Training and validation loss of Inception-V3
Table 4.4 shows that Inception-V3’s performance in lung disease detection is impressive, with high precision, recall, and F1-score values. The model achieves 97% precision for non-disease instances and 98% recall for lung disease instances. It also accurately classifies all actual disease cases, with a F1-score of 99%. Overall, the model achieves 99% accuracy across both classes.
Table 4.4: Performance metrics of Inception-V3
Precision | Recall | F1-score | Support |
Accuracy |
|
Fibrosis | 0.97 | 0.98 | 0.99 | 370 | 0.99 |
Consolidation | 0.99 | 0.99 | 0.99 | 450 |
Mobile-Net
The visualization of training and validation accuracy and loss graphs for MobileNet-based lung disease detection provides a comprehensive insight into the model’s performance dynamics. Figure 4.9 shows the graphical representations that provide a visual narrative of the model’s learning trajectory during the training process. During the training process, MobileNet is exposed to a diverse dataset containing lung images, allowing the model to learn intricate patterns and features associated with various lung diseases. The training accuracy reflects how well the model has adapted to the training data. High validation accuracy signifies the model’s ability to generalize well to new data.
Figure 4.9: Training and validation accuracy of Mobile-Net
The training and validation loss graph illustrates the model’s convergence during the training process, depicting the gradual decrease in error as it refines its predictions on the lung disease dataset. The validation loss graph is instrumental in evaluating the model’s ability to generalize to new, unseen data. A decreasing trend in validation loss suggests that the model is not overfitting to the training set and is capable of making accurate predictions on diverse datasets, as shown in Figure 4.10.
Figure 4.10: Training and validation loss of Mobile-Net
MobileNet’s classification results for lung disease detection show high performance, with a precision of 97% for non-disease instances and a 98% recall for lung disease instances. The model’s F1-score is 98% for class 0, and 99% for class 1, indicating accurate classification of lung disease and non-disease instances, as shown in Table 4.5. These high precision, recall, and F1-score values highlight MobileNet’s effectiveness in accurately identifying lung disease and non-disease instances.
Table 4.5: Performance metrics of Mobile-Net
Precision | Recall | F1-score | Support |
Accuracy |
|
Fibrosis | 0.97 | 0.98 | 0.98 | 370 | 0.99 |
Consolidation | 0.91 | 0.98 | 0.99 | 450 |
Swin-transformer
The training and validation accuracy graphs for the Swin Transformer model are pivotal components of the evaluation process. These visual representations offer a nuanced understanding of the model’s learning dynamics during the training phase, as shown in Figure 4.11. The Swin Transformer’s training and validation accuracy graphs reveal its learning dynamics. A rising curve in the training graph indicates the model’s ability to learn lung disease patterns, while a sustained upward trend in the validation graph indicates its robust performance on diverse datasets.
Figure 4.11: Training and validation accuracy of Swin transformer
The training and validation loss graphs serve as critical tools for assessing the model’s training dynamics and generalization capabilities, as shown in Figure 4.12. The training loss graph illustrates the diminishing trend of predictive errors as the model refines its understanding of complex patterns within the lung disease dataset over successive epochs.
Figure 4.12: Training and validation loss of Swin transformer
The Swin Transformer’s lung disease detection model shows exceptional accuracy and reliability in Table 4.6. It achieves 99% precision for non-disease instances, 98% recall for lung disease instances, and 99% overall accuracy for lung disease instances. The model’s high precision, recall, and F1-score values demonstrate its robustness in distinguishing between lung disease and non-disease instances.
Table 4.6: Performance matrices of Swin transformer
Precision | Recall | F1-score | Support | Accuracy | |
0 | 0.99 | 0.98 | 0.99 | 370 | 0.99 |
1 | 0.98 | 0.99 | 0.99 | 450 |
The Swin Transformer’s performance in detecting lung diseases, specifically fibrosis and consolidation, is evaluated using a confusion matrix consisting of four key elements: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), as shown in Figure 4.13. Interpreting these elements helps quantify the model’s ability to distinguish between lung diseases and non-disease instances.
Figure 4.13: Confusion matrix of CNN architecture
Proposed Vision_transformer with self-attention
The Proposed Vision Transformer with self-attention’s performance in lung disease detection is evaluated through training and validation accuracy. Training accuracy measures the model’s adaptation to the training data, while validation accuracy assesses its generalization capabilities. High training accuracy indicates the model’s proficiency in capturing nuances, while comparable validation accuracy suggests it can generalize well to diverse instances, as shown in Figure 4.14. A comprehensive analysis of these accuracy values is crucial for ensuring accurate lung disease detection.
Figure 4.14: Training and validation accuracy of proposed model
The evaluation of the Proposed Vision Transformer with self-attention in the domain of lung disease detection encompasses a thorough examination of training and validation loss. These key metrics serve as indicators of the model’s convergence and generalization during the learning process, as shown in Figure 4.15. The training loss graph illustrates the model’s progression toward optimal performance over successive epochs. Simultaneously, the validation loss is crucial for assessing the model’s ability to generalize its learned patterns to new, unseen data.
Figure 4.15: Training and validation loss of the proposed model
The classification results for the Proposed Vision Transformer with self-attention in lung disease detection are shown in Table 4.7. For class 0, representing non-disease instances, the model achieves a precision of 99%, indicating that 99% of the predicted non-disease cases are correct. The recall for class 0 is perfect at 100%, suggesting the model captures all actual non-disease cases. The F1-score for class 0 is 99%, reflecting a harmonious balance between precision and recall. Similarly, for class 1, representing lung disease instances, the precision is 99%, signifying that all predicted disease cases are accurate. The recall for class 1 is 99%, demonstrating the model’s ability to identify the majority of actual disease cases. The F1-score for class 1 is a perfect 99%. The overall accuracy of the model is a remarkable 99.47%, affirming its proficiency in correctly classifying instances within the dataset. These high precision, recall, and F1-score values collectively underscore the robustness and accuracy of the Proposed Vision Transformer with self-attention in distinguishing between lung disease and non-disease instances.
Table 4.7: Performance matrices of the proposed model
Precision | Recall | F1-score | Support | |
0 | 0.99 | 0.98 | 0.99 | 370 |
1 | 0.98 | 0.99 | 0.99 | 450 |
Accuracy | 99.47% | 820 | ||
Macro-avg | 0.99 | 0.99 | 0.99 | 820 |
Weighted -avg | 0.99 | 0.99 | 0.99 | 820 |
The Proposed Vision Transformer’s ability to detect lung diseases, including fibrosis and consolidation, is evaluated using a confusion matrix, as shown in Figure 4.15. This matrix consists of four components: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). Analyzing these values provides insights into the Transformer’s strengths and areas for improvement in detecting lung diseases.
Figure 4.16: Confusion matrix of proposed model
Discussion
The dataset for lung disease detection includes 6,460 images categorized into fibrosis and consolidation. A diverse set of 4,815 images is used for training, while 1,645 images are used for testing. This allows for unbiased evaluation of the model’s generalization capabilities and accuracy in identifying fibrosis and consolidation in lung disease detection.
Comparison with other models
The discussion comparing the proposed Vision Transformer with self-attention to several well-established architectures, including CNN, Inception-V3, MobileNet, ResNet-50, and Swin Transformer, reveals compelling insights into the model’s performance in lung disease detection. The comparison between the proposed Vision Transformer with self-attention and several traditional and state-of-the-art architectures, including CNN, Inception-V3, MobileNet, ResNet-50, and Swin Transformer, provides valuable insights into their respective accuracies in lung disease detection. The traditional CNN architecture achieves a commendable accuracy of 99%, ResNet50 achieves 94% accuracy, MobileNet achieves 99% accuracy, Inception V3 achieves 99% accuracy, and Swin transformer achieves 99% accuracy. However, the proposed Vision Transformer with self-attention outshines this performance by achieving a perfect accuracy of 99.47%, as shown in Table 4.8.
Table 4.8: Comparison with pretrained model
Model | Accuracy |
CNN | 99% |
Inception V3 | 99% |
ResNet-50 | 94% |
MobileNet | 99% |
Swin-Transformer | 99% |
ViT with Self-Attention | 99.47% |
The graphical representation illustrates the superior performance of the proposed Vision Transformer with self-attention, achieving a perfect accuracy of 99.47%, in contrast to other architectures such as CNN (99%), Inception-V3 (99%), MobileNet (99%), ResNet-50 (94%), and Swin Transformer (99%) is shown in figure 4.16.
Figure 4.17: Comparative accuracy graph
Comparison with previous studies
The discussion comparing the proposed model for lung disease detection with previously published studies is essential to contextualize the model’s contributions within the existing body of research, as shown in Table 4.9.
Table 4.9: Comparison with previous studies
Limitations
Interpretability is limited, making it difficult to decipher specific features. Data dependency and potential class imbalances in datasets also pose challenges. A Vision Transformer with self-attention achieved 99.47% lung disease detection accuracy, but potential dataset bias is a concern. Using comprehensive and representative datasets ensures model reliability across diverse scenarios, especially when dealing with diverse patient populations or imaging conditions. Overfitting, a risk where abnormal model accuracy is due to recall of training data, has led to a lack of robustness and hindered clinical performance. Interpretation of the vision transformer, particularly with attention mechanisms, is a challenge due to its complex neural network architecture, requiring increased confidence and clinical adoption.
Conclusion And Future Work
Conclusion
Lung cancer is the most common type of cancer mortality globally, with cigarette smoking being the leading cause. It can develop in any part of the lung and is primarily derived from epithelial cells. The risk of lung disease is significant, especially in emerging and low-income nations where millions are impoverished and exposed to air pollution. The number of people dying from cancer is expected to rise further, reaching around 17 million by 2030. Lung cancer can be diagnosed using various technologies, including MRI, isotope, X-ray, and CT.
The National Lung Screening Trial (NLST) aimed to determine if screens with low-dose CT could lower cancer-related death. Over 90% of the population adhered to the testing protocol, with a percentage of positive screening tests being 6.9% for radiographs and 24.2% for low-dose CT. False positive outcomes accounted for 96.4% of the positive screening outcomes in the low-dose CT group and 94.5% in the radiography group. This study discusses the use of CT in a paired single-step procedure for lung cancer diagnosis, comparing it to contrast-enhanced PET/CT lung tumor staging. More than 70% of deep learning researchers report encouraging outcomes when using CNN-based techniques for early detection of multi-organ cancer.
Deep learning techniques have the potential to revolutionize early diagnosis and treatment, improving patient outcomes and reducing healthcare burdens. Deep learning models can efficiently identify subtle patterns and abnormalities in medical images like chest X-rays and CT scans, enabling healthcare professionals to detect diseases at their nascent stages, enhancing the chances of successful treatment and minimizing the economic and emotional costs associated with advanced-stage diseases.
Lung diseases, including pneumonia, tuberculosis, and lung cancer, represent global health challenges. Deep learning offers opportunities to enhance diagnostic capabilities by detecting intricate patterns in medical imaging data. The research focuses on developing and refining deep learning algorithms for early and accurate detection of lung diseases using various medical imaging modalities. The dissertation aims to push the boundaries of current knowledge by creating and refining innovative deep learning algorithms for the precise and early detection of lung conditions. The research also addresses interpretability and explainability issues related to deep learning models, promoting higher trust in the decision-making process.
Lung disease detection using deep learning techniques is crucial to combat global health challenges. Traditional diagnostic methods struggle with accuracy and efficiency, especially in early disease detection. Deep learning techniques, like convolutional neural networks and recurrent neural networks, can learn intricate patterns from large datasets, making them ideal for analyzing medical images. These models can detect subtle abnormalities, enabling timely intervention and improved patient outcomes. However, challenges remain, such as interpretability, ethical considerations, and seamless integration into clinical workflows. The integration of deep learning into respiratory medicine could significantly improve public health outcomes and lung disease diagnosis and treatment.
The dataset for lung disease detection is a comprehensive collection of medical images categorized into two distinct classes: fibrosis and consolidation. The dataset is structured into two distinct classes, with 2,676 instances representing fibrosis and 3,784 instances depicting cases of consolidation. The dataset includes data from both CT and X-ray images, reflecting the diversity of pathological conditions affecting the lungs. The inclusion of these two classes enables the development of models capable of discerning nuanced patterns associated with fibrosis and consolidation.
Data preprocessing is the first phase of this work, aiming to improve image quality, leading to improved categorization and segmentation performance. This process includes tasks such as image resizing, normalization, data augmentation, background removal, histogram equalization, handling missing data, label encoding, and data splitting. Preprocessing reduces noise and unnecessary complexity in the dataset, enabling machine learning algorithms to focus on important patterns.
This study presents a deep learning model for lung disease detection using ResNet-50 as a feature extractor. The model architecture follows a transfer learning approach, leveraging knowledge gained from training on the ImageNet dataset. ResNet50 is initialized with pre-trained weights on the ImageNet dataset and added as the second layer in the sequential model. MobileNet is chosen for its efficiency, lightweight architecture, and balance between accuracy and computational cost. The model incorporates MobileNet as its convolutional base, and various layers are added for feature extraction, transformation, and classification.
The adoption of Swin Transformer in the context of lung disease detection is motivated by its unique features and capabilities that align with the intricate characteristics of medical images associated with these conditions. Swin Transformer is known for its effectiveness in capturing long-range dependencies and hierarchical features, providing an ideal solution for discerning these intricate patterns within lung images. The methodology for lung disease detection includes random cropping and horizontal flipping through data augmentation techniques, enhancing the model’s ability to generalize across diverse clinical scenarios.
The proposed model for detecting lung diseases is based on a Vision Transformer (ViT) architecture, which has shown significant success in image classification tasks. The model begins by taking input images of a specified shape and applies data augmentation to enhance its generalization capabilities. Following augmentation, the images undergo a patching process, breaking them down into patches using a Patches layer and encoded through a Patch Encoder. This technique is commonly used in image-based tasks like lung disease detection, as it captures local patterns and features, enhancing the detection model’s sensitivity to subtle lung disease patterns.
ResNet-50 is a deep residual learning model designed to capture intricate features in medical images. It effectively distinguishes between two classes (0 and 1) in a dataset, with a perfect precision for class 0 and a slightly lower recall for class 1. The model’s overall accuracy is 0.94. Inception-V3’s performance in lung disease detection is impressive, with high precision, recall, and F1-score values. MobileNet’s classification results show high performance, with a precision of 97% for non-disease instances and a 99% recall for lung disease instances. The Swin Transformer’s lung disease detection model shows exceptional accuracy and reliability, with 99% precision for non-disease instances, 98% recall for lung disease instances, and 99% overall accuracy for lung disease instances. The Proposed Vision Transformer with self-attention in lung disease detection achieves a remarkable 99.47% accuracy, demonstrating its robustness and accuracy in distinguishing between lung disease and non-disease instances.
Future work
Future research could focus on improving the interpretability of the model by developing techniques to elucidate the specific features or patterns driving its decisions. To reduce data dependency and potential class imbalance, researchers will explore innovative data augmentation techniques and strategies to create more balanced data sets. Ensuring diversity in the training data and implementing techniques to remove biases can contribute to more robust and fair model performance. Optimizing CNN architectures for resource efficiency and collaborating between computer scientists and healthcare professionals is crucial for technologically sound and practical advancements in lung disease detection.
References
Achu, A., Thomas, J., Aju, C., Gopinath, G., Kumar, S., & Reghunath, R. (2021). Machine-learning modelling of fire susceptibility in a forest-agriculture mosaic landscape of southern India. Ecological Informatics, 64, 101348.
Akter, O., Moni, M. A., Islam, M. M., Quinn, J. M., & Kamal, A. (2021). Lung cancer detection using enhanced segmentation accuracy. Applied Intelligence, 51, 3391-3404.
Alakwaa, W., Nassef, M., & Badr, A. (2017). Lung cancer detection and classification with 3D convolutional neural network (3D-CNN). International Journal of Advanced Computer Science and Applications, 8(8).
Alberg, A. J., & Samet, J. M. (2003). Epidemiology of lung cancer. Chest, 123(1), 21S-49S.
Ali, S., Li, J., Pei, Y., Khurram, R., Rehman, K. u., & Rasool, A. B. (2021). State-of-the-Art challenges and perspectives in multi-organ cancer diagnosis via deep learning-based methods. Cancers, 13(21), 5546.
Asuntha, A., & Srinivasan, A. (2020). Deep learning for lung Cancer detection and classification. Multimedia Tools and Applications, 79(11), 7731-7762.
Benmalek, E., Elmhamdi, J., & Jilbab, A. (2021). Comparing CT scan and chest X-ray imaging for COVID-19 diagnosis. Biomedical Engineering Advances, 1, 100003.
Bharati, S., Podder, P., Mondal, R., Mahmood, A., & Raihan-Al-Masud, M. (2020). Comparative performance analysis of different classification algorithms for the purpose of predicting lung cancer. Intelligent Systems Design and Applications: 18th International Conference on Intelligent Systems Design and Applications (ISDA 2018) held in Vellore, India, December 6-8, 2018, Volume 2,
Bibi, N., Sikandar, M., Ud Din, I., Almogren, A., & Ali, S. (2020). IoMT-based automated detection and classification of leukemia using deep learning. Journal of healthcare engineering, 2020, 1-12.
Cao, H., Liu, H., Song, E., Ma, G., Xu, X., Jin, R., . . . Hung, C.-C. (2020). A two-stage convolutional neural networks for lung nodule detection. IEEE journal of biomedical and health informatics, 24(7), 2006-2015.
Cha, M. J., Chung, M. J., Lee, J. H., & Lee, K. S. (2019). Performance of deep learning model in detecting operable lung cancer with chest radiographs. Journal of thoracic imaging, 34(2), 86-91.
Chabon, J. J., Hamilton, E. G., Kurtz, D. M., Esfahani, M. S., Moding, E. J., Stehr, H., . . . Chaudhuri, A. A. (2020). Integrating genomic features for non-invasive early lung cancer detection. Nature, 580(7802), 245-251.
Christe, A., Peters, A. A., Drakopoulos, D., Heverhagen, J. T., Geiser, T., Stathopoulou, T., . . . Ebner, L. (2019). Computer-aided diagnosis of pulmonary fibrosis using deep learning and CT images. Investigative radiology, 54(10), 627-632.
Dawoud, A. (2011). Lung segmentation in chest radiographs by fusing shape information in iterative thresholding. IET Computer Vision, 5(3), 185-190.
de Castro, A. B. G., Domínguez, J. F., Bolton, R. D., Pérez, C. F., Martínez, B. C., García-Esquinas, M. G., & Delgado, J. C. (2017). PET-CT in presurgical lymph node staging in non-small cell lung cancer: The importance of false-negative and false-positive findings. Radiología (English Edition), 59(2), 147-158.
Dhaware, B. U., & Pise, A. C. (2016). Lung cancer detection using Bayesian classifier and FCM segmentation. 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT),
Dutta, K. (2021). Densely connected recurrent residual (Dense R2UNet) convolutional neural network for segmentation of lung CT images. arXiv preprint arXiv:2102.00663.
Eun, H., Kim, D., Jung, C., & Kim, C. (2018). Single-view 2D CNNs with fully automatic non-nodule categorization for false positive reduction in pulmonary nodule detection. Computer methods and programs in biomedicine, 165, 215-224.
Feng, Y., Hao, P., Zhang, P., Liu, X., Wu, F., & Wang, H. (2019). Supervoxel based weakly-supervised multi-level 3D CNNs for lung nodule detection and segmentation. Journal of Ambient Intelligence and Humanized Computing, 1-11.
Fernandes, S. L., Gurupur, V. P., Lin, H., & Martis, R. J. (2017). A novel fusion approach for early lung cancer detection using computer-aided diagnosis techniques. Journal of Medical Imaging and Health Informatics, 7(8), 1841-1850.
Filipska, M., & Rosell, R. (2021). Mutated circulating tumor DNA as a liquid biopsy in lung cancer detection and treatment. Molecular Oncology, 15(6), 1667-1682.
Franck, C., Snoeckx, A., Spinhoven, M., El Addouli, H., Nicolay, S., Van Hoyweghen, A., . . . Zanca, F. (2021). Pulmonary nodule detection in chest CT using a deep learning-based reconstruction algorithm. Radiation protection dosimetry, 195(3-4), 158-163.
Gan, W., Wang, H., Gu, H., Duan, Y., Shao, Y., Chen, H., . . . Ying, Y. (2021). Automatic segmentation of lung tumors on CT images based on a 2D & 3D hybrid convolutional neural network. The British Journal of Radiology, 94, 20210038.
George, J., Skaria, S., & Varun, V. (2018). Using YOLO based deep learning network for real-time detection and localization of lung nodules from low-dose CT scans. Medical Imaging 2018: Computer-Aided Diagnosis,
Ghali, R., & Akhloufi, M. A. (2023). Vision transformers for lung segmentation on CXR images. SN Computer Science, 4(4), 414.
Ghimire, S., & Subedi, S. (2024). Estimating Lung Volume Capacity from X-ray Images Using Deep Learning. Quantum Beam Science, 8(2), 11.
Goyal, S., & Singh, R. (2021). Detection and classification of lung diseases for pneumonia and COVID-19 using machine and deep learning techniques. Journal of Ambient Intelligence and Humanized Computing, 1-21.
Goyal, S., & Singh, R. (2023). Detection and classification of lung diseases for pneumonia and COVID-19 using machine and deep learning techniques. Journal of Ambient Intelligence and Humanized Computing, 14(4), 3239-3259.
Hamidian, S., Sahiner, B., Petrick, N., & Pezeshk, A. (2017). 3D convolutional neural network for automatic detection of lung nodules in chest CT. Medical Imaging 2017: Computer-Aided Diagnosis,
Hitimana, E., Bajpai, G., Musabe, R., Sibomana, L., & Kayalvizhi, J. (2021). Implementation of an IoT framework with data analysis using deep learning methods for occupancy prediction in a building. Future Internet, 13(3), 67.
Ippolito, D., Capraro, C., Guerra, L., De Ponti, E., Messa, C., & Sironi, S. (2013). Feasibility of perfusion CT technique integrated into conventional 18FDG/PET-CT studies in lung cancer patients: clinical staging and functional information in a single study. European journal of nuclear medicine and molecular imaging, 40, 156-165.
Jemal, A., Center, M. M., DeSantis, C., & Ward, E. M. (2010). Global patterns of cancer incidence and mortality rates and trends. Cancer epidemiology, biomarkers & prevention, 19(8), 1893-1907.
Jiang, J., Hu, Y.-C., Liu, C.-J., Halpenny, D., Hellmann, M. D., Deasy, J. O., . . . Veeraraghavan, H. (2018). Multiple resolution residually connected feature streams for automatic lung tumor segmentation from CT images. IEEE transactions on medical imaging, 38(1), 134-144.
Khosravan, N., & Bagci, U. (2018). Semi-supervised multi-task learning for lung cancer diagnosis. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),
Kim, H. M., Ko, T., Choi, I. Y., & Myong, J.-P. (2022). Asbestosis diagnosis algorithm combining the lung segmentation method and deep learning model in computed tomography images. International Journal of Medical Informatics, 158, 104667.
Kooi, T., Litjens, G., Van Ginneken, B., Gubern-Mérida, A., Sánchez, C. I., Mann, R., . . . Karssemeijer, N. (2017). Large-scale deep learning for computer-aided detection of mammographic lesions. Medical image analysis, 35, 303-312.
Kuan, K., Ravaut, M., Manek, G., Chen, H., Lin, J., Nazir, B., . . . Chandrasekhar, V. (2017). Deep learning for lung cancer detection: tackling the Kaggle data science bowl 2017 challenge. arXiv preprint arXiv:1705.09435.
Lakshmanaprabu, S., Mohanty, S. N., Shankar, K., Arunkumar, N., & Ramirez, G. (2019). Optimal deep learning model for classification of lung cancer on CT images. Future Generation Computer Systems, 92, 374-382.
Lee, W.-K., Lau, E. W., Chin, K., Sedlaczek, O., & Steinke, K. (2013). Modern diagnostic and therapeutic interventional radiology in lung cancer. Journal of Thoracic Disease, 5(Suppl 5), S511.
Lin, X., Jiao, H., Pang, Z., Chen, H., Wu, W., Wang, X., . . . Li, S. (2021). Lung cancer and granuloma identification using a deep learning model to extract 3-dimensional radiomics features in CT imaging. Clinical Lung Cancer, 22(5), e756-e766.
Liu, X., Hou, F., Qin, H., & Hao, A. (2018). Multi-view multi-scale CNNs for lung nodule type classification from CT images. Pattern Recognition, 77, 262-275.
Lu, H. (2021). Computer-aided diagnosis research of a lung tumor based on a deep convolutional neural network and global features. BioMed Research International, 2021.
Ma, J., Song, Y., Tian, X., Hua, Y., Zhang, R., & Wu, J. (2020). Survey on deep learning for pulmonary medical imaging. Frontiers of medicine, 14, 450-469.
Masood, A., Sheng, B., Li, P., Hou, X., Wei, X., Qin, J., & Feng, D. (2018). Computer-assisted decision support system in pulmonary cancer detection and stage classification on CT images. Journal of biomedical informatics, 79, 117-128.
Mondal, M. R. H., Bharati, S., Podder, P., & Podder, P. (2020). Data analytics for novel coronavirus disease. Informatics in medicine unlocked, 20, 100374.
Munawar, F., Azmat, S., Iqbal, T., Grönlund, C., & Ali, H. (2020). Segmentation of lungs in chest X-ray image using generative adversarial networks. IEEE Access, 8, 153535-153545.
Nam, J. G., Park, S., Hwang, E. J., Lee, J. H., Jin, K.-N., Lim, K. Y., . . . Goo, J. M. (2019). Development and validation of deep learning–based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology, 290(1), 218-228.
Nie, L., Wang, M., Zhang, L., Yan, S., Zhang, B., & Chua, T.-S. (2015). Disease inference from health-related questions via sparse deep learning. IEEE Transactions on Knowledge and Data Engineering, 27(8), 2107-2119.
Nogues, I., Yao, J., Mollura, D., & Summers, R. M. (2016). Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. In: IEEE.
Nour, M., Cömert, Z., & Polat, K. (2020). A novel medical diagnosis model for COVID-19 infection detection based on deep features and Bayesian optimization. Applied Soft Computing, 97, 106580.
Pal, O. K., Roy, S., Modok, A. K., Teethi, T. I., & Sarker, S. K. (2024). ULung: A Novel Approach for Lung Image Segmentation. 2024 6th International Conference on Computing and Informatics (ICCI),
Park, S. C., Tan, J., Wang, X., Lederman, D., Leader, J. K., Kim, S. H., & Zheng, B. (2011). Computer-aided detection of early interstitial lung diseases using low-dose CT images. Physics in Medicine & Biology, 56(4), 1139.
Peng, T., Wang, C., Zhang, Y., & Wang, J. (2022). H-SegNet: hybrid segmentation network for lung segmentation in chest radiographs using mask region-based convolutional neural network and adaptive closed polyline searching method. Physics in Medicine & Biology, 67(7), 075006.
Priyadarsini, M. J. P., Rajini, G., Hariharan, K., Raj, K. U., Ram, K. B., Indragandhi, V., . . . Pandya, S. (2023). Lung diseases detection using various deep learning algorithms. Journal of Healthcare Engineering, 2023.
Protonotarios, N. E., Katsamenis, I., Sykiotis, S., Dikaios, N., Kastis, G. A., Chatziioannou, S. N., . . . Doulamis, A. (2022). A few-shot U-Net deep learning model for lung cancer lesion segmentation via PET/CT imaging. Biomedical Physics & Engineering Express, 8(2), 025019.
Rahman, T., Chowdhury, M. E., Khandakar, A., Islam, K. R., Islam, K. F., Mahbub, Z. B., . . . Kashem, S. (2020). Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Applied Sciences, 10(9), 3233.
Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., . . . Shpanskaya, K. (2017). Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225.
Riquelme, D., & Akhloufi, M. A. (2020). Deep learning for lung cancer nodules detection and classification in CT scans. Ai, 1(1), 28-67.
Roy, S., & Das, A. K. (2023). Deep‐CoV: An integrated deep learning model to detect COVID‐19 using chest X‐ray and CT images. Computational Intelligence, 39(2), 369-400.
Salvi, M., Mogetta, A., Raghavendra, U., Gudigar, A., Acharya, U. R., & Molinari, F. (2024). A Dynamic Uncertainty-Aware Ensemble Model: Application to Lung Cancer Segmentation in Digital Pathology. Applied Soft Computing, 112081.
Schwyzer, M., Ferraro, D. A., Muehlematter, U. J., Curioni-Fontecedro, A., Huellner, M. W., Von Schulthess, G. K., . . . Messerli, M. (2018). Automated detection of lung cancer at ultralow dose PET/CT by deep neural networks–initial results. Lung Cancer, 126, 170-173.
Selvanambi, R., Natarajan, J., Karuppiah, M., Islam, S. H., Hassan, M. M., & Fortino, G. (2020). Lung cancer prediction using higher-order recurrent neural network based on glowworm swarm optimization. Neural Computing and Applications, 32, 4373-4386.
Setio, A. A. A., Traverso, A., De Bel, T., Berens, M. S., Van Den Bogaard, C., Cerello, P., . . . Geurts, B. (2017). Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Medical image analysis, 42, 1-13.
Shen, D., Wu, G., & Suk, H.-I. (2017). Deep learning in medical image analysis. Annual review of biomedical engineering, 19, 221-248.
Shim, S. S., Lee, K. S., Kim, B.-T., Chung, M. J., Lee, E. J., Han, J., . . . Kim, S. (2005). Non–small cell lung cancer: prospective comparison of integrated FDG PET/CT and CT alone for preoperative staging. Radiology, 236(3), 1011-1019.
Shin, H.-C., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., & Summers, R. M. (2016). Learning to read chest x-rays: Recurrent neural cascade model for automated image annotation. Proceedings of the IEEE conference on computer vision and pattern recognition,
Shin, H.-C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., . . . Summers, R. M. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging, 35(5), 1285-1298.
Silva, G. L. F. d., Carvalho Filho, A. O. d., Silva, A. C., Paiva, A. C. d., & Gattass, M. (2016). Taxonomic indexes for differentiating malignancy of lung nodules on CT images. Research on Biomedical Engineering, 32, 263-272.
Song, Y., Zheng, S., Li, L., Zhang, X., Zhang, X., Huang, Z., . . . Chong, Y. (2021). Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images. IEEE/ACM transactions on computational biology and bioinformatics, 18(6), 2775-2780.
Sourav, M. S. U., & Wang, H. (2023). Intelligent identification of jute pests based on transfer learning and deep convolutional neural networks. Neural Processing Letters, 55(3), 2193-2210.
Souza, J. C., Diniz, J. O. B., Ferreira, J. L., Da Silva, G. L. F., Silva, A. C., & de Paiva, A. C. (2019). An automatic method for lung segmentation and reconstruction in chest X-ray using deep neural networks. Computer methods and programs in biomedicine, 177, 285-296.
Sriporn, K., Tsai, C.-F., Tsai, C.-E., & Wang, P. (2020). Analyzing lung disease using highly effective deep learning techniques. Healthcare,
Swapna, Y., Sirisha, G., Mary, G. A., Aparna, G., Ruthvik, K., & Suneela, B. (2021). Covid-19-Based Intelligent Healthcare Monitoring System Using Deep Learning & Iot Principles For Pulmonary Breathing Problem Patients. International Journal of Future Generation Communication and Networking, 14(1), 1289-1297.
Tan, J., Huo, Y., Liang, Z., & Li, L. (2019). Expert knowledge-infused deep learning for automatic lung nodule detection. Journal of X-ray Science and Technology, 27(1), 17-35.
Team, N. L. S. T. R. (2011). Reduced lung-cancer mortality with low-dose computed tomographic screening. New England Journal of Medicine, 365(5), 395-409.
Teixeira, L. O., Pereira, R. M., Bertolini, D., Oliveira, L. S., Nanni, L., Cavalcanti, G. D., & Costa, Y. M. (2021). Impact of lung segmentation on the diagnosis and explanation of COVID-19 in chest X-ray images. Sensors, 21(21), 7116.
Teramoto, A., Yamada, A., Tsukamoto, T., Imaizumi, K., Toyama, H., Saito, K., & Fujita, H. (2020). Decision support system for lung cancer using PET/CT and microscopic images. Deep Learning in Medical Image Analysis: Challenges and Applications, 73-94.
Theresa, M. M., & Bharathi, V. S. (2016). CAD for lung nodule detection in chest radiography using complex wavelet transform and shearlet transform features. Indian Journal of Science and Technology, 9(1), 1-12.
Tripathi, S., Shetty, S., Jain, S., & Sharma, V. (2021). Lung disease detection using deep learning. Int. J. Innov. Technol. Explor. Eng, 10(8).
Ucar, F., & Korkmaz, D. (2020). COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images. Medical hypotheses, 140, 109761.
Ukwuoma, C. C., Qin, Z., Heyat, M. B. B., Akhtar, F., Smahi, A., Jackson, J. K., . . . Nneji, G. U. (2022). Automated lung-related pneumonia and COVID-19 detection based on novel feature extraction framework and vision transformer approaches using chest X-ray images. Bioengineering, 9(11), 709.
Vaidya, P., Bera, K., Patil, P. D., Gupta, A., Jain, P., Alilou, M., . . . Madabhushi, A. (2020). Novel, non-invasive imaging approach to identify patients with advanced non-small cell lung cancer at risk of hyperprogressive disease with immune checkpoint blockade. Journal for Immunotherapy of Cancer, 8(2).
Vaishya, R., Vijay, V., Birla, V. P., & Agarwal, A. K. (2016). Inter-observer variability and its correlation to experience in measurement of lower limb mechanical axis on long leg radiographs. Journal of Clinical Orthopaedics and Trauma, 7(4), 260-264.
Wang, L., Lin, Z. Q., & Wong, A. (2020). COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest x-ray images. Scientific reports, 10(1), 19549.
Wang, S., Kang, B., Ma, J., Zeng, X., Xiao, M., Guo, J., . . . Meng, X. (2021). A deep learning algorithm using CT images to screen for Coronavirus Disease (COVID-19). European radiology, 31, 6096-6104.
Wang, S., Zhou, M., Gevaert, O., Tang, Z., Dong, D., Liu, Z., & Jie, T. (2017). A multi-view deep convolutional neural networks for lung nodule segmentation. 2017 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC),
Wang, Y., Wu, B., Zhang, N., Liu, J., Ren, F., & Zhao, L. (2020). Research progress of computer aided diagnosis system for pulmonary nodules in CT images. Journal of X-ray Science and Technology, 28(1), 1-16.
Wu, C., Luo, C., Xiong, N., Zhang, W., & Kim, T.-H. (2018). A greedy deep learning method for medical disease analysis. IEEE Access, 6, 20021-20030.
Yu, K.-H., Zhang, C., Berry, G. J., Altman, R. B., Ré, C., Rubin, D. L., & Snyder, M. (2016). Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature communications, 7(1), 12474.
Zhang, M., Li, H., Pan, S., Lyu, J., Ling, S., & Su, S. (2021). Convolutional neural networks-based lung nodule classification: A surrogate-assisted evolutionary algorithm for hyperparameter optimization. IEEE Transactions on Evolutionary Computation, 25(5), 869-882.
Zhong, Z., Kim, Y., Plichta, K., Allen, B. G., Zhou, L., Buatti, J., & Wu, X. (2019). Simultaneous cosegmentation of tumors in PET‐CT images using deep fully convolutional networks. Medical physics, 46(2), 619-633.
Zhu, W., Liu, C., Fan, W., & Xie, X. (2018). Deep Lung: Deep 3d dual-path nets for automated pulmonary nodule detection and classification. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV),
Cite This Work
To export a reference to this article please select a referencing stye below: