Speaker
Description
AI-driven active infrared thermography (AIRT) has emerged as a promising modality for supporting automated reliability assurance and safety monitoring across industrial and aerospace domains. By capturing the transient thermal response of materials under controlled excitation, AIRT enables non-destructive detection of subsurface defects such as impact damage, delaminations, and voids. Despite the recent progress in applying deep learning to AIRT inspection, current AI pipelines remain fundamentally constrained by the scarcity of annotated thermographic datasets. The acquisition of high-quality labels requires expert interpretation, controlled laboratory setups, and extensive post-processing, making large-scale datasets costly to obtain and limiting the generalization of trained models to new defect types, geometries, and inspection conditions. Vision–language models (VLMs) introduce a new paradigm that can mitigate these limitations. By jointly reasoning over image and natural-language cues, VLMs enable zero-shot inference, wherein defect analysis can be guided through textual instructions rather than domain-specific supervised training. This greatly reduces dependency on specialized data collection and allows the user to describe defect characteristics, severity, and inspection intent using natural language. As such, a language-guided framework for defect analysis in AIRT using multimodal cues is proposed for cognitive defect detection. Instead of constructing large, labeled thermography datasets and training custom deep networks, the proposed framework employs recent VLMs to perform zero-shot subsurface defect detection. Three representative models—CogVLM, Qwen-VL, and GroundingDINO—are evaluated on a thermographic dataset consisting of twenty-five carbon-fiber-reinforced polymer (CFRP) specimens subjected to controlled impact damage at varying energy levels. The specimens include weak defects generated by 5 J impacts and strong defects created by 15 J impacts, providing a structured range of defect severity for evaluation. Qualitative results demonstrate that the models generate coherent scene-level descriptions while also identifying and localizing defects with notable fidelity. Quantitatively, the proposed framework attains an approximate defect detection Intersection-over-Union (IoU) of 70% in zero-shot mode, without the need for dataset curation, network development, or supervised training. These findings highlight the feasibility of deploying VLMs for automated, zero-shot AIRT inspection, reducing dependence on expert-annotated datasets and bridging the gap between manual defect interpretation and scalable AI-driven inspection frameworks. The results further illustrate how language-guided reasoning can accelerate the integration of AIRT into flexible manufacturing and maintenance environments, paving the way toward more adaptive and cognitively guided non-destructive evaluation workflows.