Late Lucid Lectures Guild

Science, softly spoken.

Artificial Intelligence

  • Enhancing Visual Question Answering on Satellite Imagery with Geospatial Chain of Thought Reasoning

    Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery

    By Shambhavi Shanker, Manikandan Padmanaban, Jagabondhu Hazra

    DOI https://doi.org/10.48550/arXiv.2511.11198

    Abstract

    Geospatial chain of thought (CoT) reasoning is essential for advancing Visual Question Answering (VQA) on satellite imagery, particularly in climate related applications such as disaster monitoring, infrastructure risk assessment, urban resilience planning, and policy support. Existing VQA models enable scalable interpretation of remote sensing data but often lack the structured reasoning required for complex geospatial queries. We propose a VQA framework that integrates CoT reasoning with Direct Preference Optimization (DPO) to improve interpretability, robustness, and accuracy. By generating intermediate rationales, the model better handles tasks involving detection, classification,spatial relations, and comparative analysis, which are critical for reliable decision support in high stakes climate domains. Experiments show that CoT supervision improves accuracy by 34.9% over direct baselines, while DPO yields additional gains in accuracy and reasoning quality. The resulting system advances VQA for multispectral Earth observation by enabling richer geospatial reasoning and more effective climate use cases.

    Introduction

    The impacts of climate change—such as floods, wildfires, and extreme weather—necessitate accurate analysis of Earth observation data. Satellite images provide comprehensive information that is critical for assessing disasters and planning for climate resilience. Manual analysis of these images is labor-intensive, and traditional machine learning methods are often too narrow.

    Vision Language Models (VLMs) now allow users to ask questions in natural language about imagery and receive grounded answers. This capability is essential for timely and informed responses to disasters, such as understanding flood mapping or wildfire monitoring.

    Despite recent advancements, current VLMs still struggle with reasoning, particularly complex reasoning that involves multiple steps or causal connections. This limitation can hamper decision-making accuracy in critical situations. The paper highlights previous research indicating that CoT reasoning enhances model interpretability and robustness. However, such reasoning approaches have been underutilized in geospatial analysis.

    The authors aim to bridge the gap by unifying reasoning-augmented supervision and preference-based alignment, creating models that are reliable and interpretable for climate-related applications.

    Methodology

    1. Chain-of-Thought Data Distillation: Existing data was used to enrich answers with reasoning. A model was used to generate step-by-step explanations for answers based on satellite images and questions, leading to a more comprehensive training dataset.

    2. Supervised Fine-Tuning (SFT): The fine-tuning stage involved training using two types of data inputs: direct question-answer pairs and question-rationale-answer combinations. Different training strategies were adopted to optimize performance.

    3. Reinforcement Learning with Direct Preference Optimization: This method refines the model’s ability to produce coherent and user-preferred outputs by comparing functional and non-functional responses to improve reasoning quality.

    Results

    Experiments revealed that CoT reasoning significantly enhances the model’s performance. The details include:

    • An overall accuracy gain of 34.9% compared to initial models.
    • Improved transferability on different datasets, specifically demonstrating that CoT reasoning boosts the model’s ability to adapt its knowledge to disaster imagery.

    For example, the model tested on a flood imagery dataset (FloodNet) achieved an accuracy increase from 59.1% to 67.4% with CoT data, showcasing its potential for better generalization of reasoning across scenarios.

    However, the model had some limitations, particularly in handling counting questions, indicating that more advanced numerical reasoning approaches may be needed in future models.

    Conclusion

    The paper concludes that using CoT supervision improves both the accuracy and interpretability of geospatial VQA systems. The framework not only enhances decision-making processes for climate-related challenges but also fosters trust through understandable reasoning. While significant strides have been made, challenges persist, particularly in numerical reasoning and adapting the model across different data contexts.

    Overall, the research indicates that structured reasoning could be pivotal for advancing reliable geospatial AI systems capable of tackling complex climate issues in a trustworthy manner.

  • Revolutionizing E-Commerce with AI: Automated Product Knowledge Graph Construction

    AI Agent-Driven Framework for Automated Product Knowledge Graph Construction in E-Commerce

    By Dimitar Peshevski, Riste Stojanov, Dimitar Trajanov

    DOI https://doi.org/10.48550/arXiv.2511.11017

    Abstract

    The rapid growth of e-commerce platforms has led to an overflow of unstructured product data, which poses challenges for information retrieval, recommendation systems, and data analytics. Knowledge Graphs, which are structured representations of data, are crucial for organizing this information. However, constructing product-specific Knowledge Graphs is often a manual and complex task. This paper presents an automated framework powered by Artificial Intelligence agents to create Knowledge Graphs using unstructured product descriptions. The proposed method is divided into three stages—ontology creation and expansion, ontology refinement, and Knowledge Graph population—utilizing Large Language Models. The evaluation on a dataset of air conditioner descriptions shows the framework’s high effectiveness, achieving over 97% property coverage and demonstrating its scalability for intelligent product data integration.

    Introduction

    E-commerce and retail platforms are generating significant amounts of unstructured product information, such as descriptions, specifications, and reviews. To utilize this data for applications like product recommendations and analytics, it must be structured into a machine-readable form. Knowledge Graphs help achieve this by representing entities (like products) and their relationships in a graph format.

    Despite their utility, creating Knowledge Graphs is typically a manual and labor-intensive process that requires domain-specific knowledge. This paper introduces an automated framework utilizing AI agents to construct Knowledge Graphs specifically for product domains. By employing Large Language Models, the framework automates the creation and refinement of product ontologies and directly generates Knowledge Graphs from product descriptions.

    Methodology

    The framework consists of three major stages:

    1. Ontology Creation and Expansion: The process starts by sampling product descriptions to identify essential ontology elements, like product classes and attributes, and organizing them into a structured format. This stage iteratively incorporates more product samples to expand the ontology by adding new classes or properties.

    2. Ontology Refinement: This stage enhances the initial ontology using the capabilities of Large Language Models. It addresses any issues of redundancy, generality, or clarity within the ontology to improve its usability and flexibility across different product types.

    3. Knowledge Graph Population: The last stage involves populating the Knowledge Graph with specific product data derived from the descriptions. This step generates RDF (Resource Description Framework) triples, which represent the relationships and attributes of products. The framework ensures the accurate representation of data without generating incorrect information.

    Evaluation

    The authors evaluated the framework on a dataset consisting of 291 product descriptions for air conditioners. The evaluation focused on three key areas:

    • Ontology Coverage: It measured how completely the ontology captured product classes, attributes, and relationships.
    • Ontology Quality: This involved a qualitative assessment of coherence, generality, and usability.
    • Knowledge Graph Population: They assessed the number of generated RDF triples and how many properties from the ontology were instantiated in the Knowledge Graph.

    The results showed that the framework constructed a modular and comprehensive ontology covering 42 classes and 69 properties. It processed 282 of the 291 descriptions, achieving a property coverage of 97.1%, demonstrating the framework’s effectiveness and robustness.

    Conclusion and Future Work

    The proposed AI agent-driven framework represents a significant advancement in automating the construction of Knowledge Graphs for e-commerce. It effectively eliminates the need for manual processes, allowing faster adaptability to new products.

    Future enhancements could include integrating various types of data (like images and user reviews) to enrich the Knowledge Graph further. Additionally, efforts could be directed towards improving the accuracy of data extraction and expanding the framework’s application to other domains, such as finance or healthcare.

    The framework promises to lay a strong foundation for advanced applications in e-commerce, such as improved product recommendations and search functionality.

  • Navigating Generative AI: Bangladeshi Journalists’ Insights and Challenges

    Generative Artificial Intelligence Adoption Among Bangladeshi Journalists: Exploring Journalists’ Awareness, Acceptance, Usage, and Organizational Stance on Generative AI

    By H. M. Murtuza, Md Oliullah

    DOI https://doi.org/10.48550/arXiv.2511.10862

    Abstract

    Newsrooms and journalists across the world are adopting Generative AI (GenAI).Drawing on in-depth interviews with 23 journalists, this study identifies Bangladeshi journalists’ awareness, acceptance, usage patterns, and their media organizations’ stance toward Gen AI. This study finds Bangladeshi journalists’high reliance on Gen AI like their Western colleagues despite limited institutional support and the near absence of AI policy. Despite this contrast,concerns over Gen AI’s implications in journalism between the West and non-West were mostly identical. Moreover, this study contributes to the Unified Theory of Acceptance and Use of Technology (UTAUT) by proposing two changes regarding Gen AI adoption among journalists in non-Western settings. First, this study identifies the non-contribution of facilitating conditions in shaping behavioral intent in Gen AI adoption in non-Western contexts. Second, social influence works in a horizontal order through informal peer pressure or professional motivation in the absence of formal institutional hierarchical pressure. Voluntariness in the context of Bangladeshi journalists is underpinned by their professional compulsion. Therefore, this study contributes to understanding how contextual factors shape technology adoption trajectories in non-Western journalism.

    Summary of the Study on Generative Artificial Intelligence Adoption Among Bangladeshi Journalists

    Overview

    This academic paper explores how journalists in Bangladesh are adopting Generative Artificial Intelligence (AI) technologies, examining their awareness, acceptance, usage, and the stance of their organizations towards these technologies. Through interviews with 23 journalists, the study finds substantial reliance on AI despite limited institutional support and absence of formal policies for its use.

    Abstract Analysis

    The study reveals that Bangladeshi journalists use Generative AI similarly to their Western counterparts, despite the significant lack of institutional frameworks and AI policies in their newsrooms. The findings show that concerns regarding AI’s implications for journalism—such as accuracy and ethical issues—are consistent with those identified in Western contexts. The study also contributes to the Unified Theory of Acceptance and Use of Technology by suggesting modifications specific to non-Western contexts.

    Introduction Analysis

    The introduction sets the stage by noting the global emergence and significance of Generative AI, with specific reference to tools such as ChatGPT. It highlights how this new technology, capable of generating high-quality content using natural language processing, has altered journalism practices globally. Despite earlier AI tools being used, the innovative capabilities of Generative AI in content creation mark a notable shift in journalism.

    The paper emphasizes that while previous studies have focused largely on Western audiences, understanding the adoption of AI in non-Western, developing contexts like Bangladesh is crucial due to differing socio-economic conditions, technology accessibility, and institutional support levels.

    Methodology

    The researchers conducted semi-structured interviews with journalists from various levels of experience and different types of news organizations, including newspapers, online portals, and television. The qualitative data were analyzed through open, axial, and selective coding, allowing for a deep understanding of the themes related to AI adoption.

    Findings

    Awareness and Usage Patterns

    • Awareness: The journalists reported increasing awareness of Generative AI tools. Many learned about these tools informally from colleagues and peers.
    • Usage Patterns: AI is commonly used for various tasks including information gathering, scriptwriting, brainstorming, editing, and multimedia assistance. Notable tools mentioned include ChatGPT, Google Translate, and Grammarly.

    Benefits of AI Adoption

    1. Efficiency: Journalists noted a significant increase in efficiency and productivity, with many able to complete tasks more quickly using AI tools.
    2. Quality of Work: AI tools provide support in drafting and editing, improving overall content quality and helping to manage large volumes of information.
    3. Competitive Necessity: The pressure to adopt AI to stay competitive in the growing digital landscape was emphasized, with journalists feeling compelled to utilize AI to avoid falling behind.

    Concerns Over AI Integration

    • Accuracy and Reliability: Many journalists expressed concerns about the accuracy of AI outputs, noting that AI could sometimes provide outdated or misleading information.
    • Cognitive Impact: There were fears that reliance on AI could reduce critical thinking and creativity, making journalists overly reliant on technology for information and content generation.
    • Job Security: Concerns about potential job losses due to AI automation were prominent, particularly in a country with high unemployment rates.

    Institutional Stance

    The study found that most news organizations in Bangladesh did not have formal policies regarding the use of AI. The absence of institutional support for training and managing risks related to AI adoption represents a significant gap that contrasts sharply with practices in the West.

    Conclusion

    The research concludes that while Bangladeshi journalists are quickly adopting Generative AI, the lack of institutional guidance and structured policies raises ethical and operational concerns. It suggests that the adoption in Bangladesh operates under different motivations compared to the West, pertaining to professional necessities rather than organizational mandates. The study’s implications highlight the need for contextual adaptations of technology acceptance theories, particularly in developing regions.

    Proposed Modifications to UTAUT for Non-Western Contexts

    1. Facilitating Conditions: The role of institutional support may not be as critical as previously thought in predicting AI adoption.
    2. Social Influence: Informal peer pressures play a significant role in the adoption process within journalistic settings.
    3. Voluntary-Compulsion Spectrum: Journalists may adopt AI out of professional necessity rather than voluntary choice, reflecting the unique pressures of the Bangladeshi media landscape.

    In summary, this study emphasizes the complex dynamics of Generative AI adoption in Bangladeshi journalism, highlighting both the advancements and challenges faced by journalists in a developing country context.