Late Lucid Lectures Guild

Science, softly spoken.

E-commerce

  • Unlocking E-commerce Success with MOON: Advanced Multimodal Representation Learning

    MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising

    By Chenghan Fu, Daoze Zhang, Yukang Lin, Zhanheng Nie, Xiang Zhang, Jianyu Liu, Yueran Liu, Wanxian Guan, Pengjie Wang, Jian Xu, Bo Zheng

    DOI https://doi.org/10.48550/arXiv.2511.11305

    Abstract

    We introduce MOON, our comprehensive set of sustainable iterative practices for multimodal representation learning for e-commerce applications. MOON has already been fully deployed across all stages of Taobao search advertising system,including retrieval, relevance, ranking, and so on. The performance gains are particularly significant on click-through rate (CTR) prediction task, which achieves an overall +20.00% online CTR improvement. Over the past three years,this project has delivered the largest improvement on CTR prediction task and undergone five full-scale iterations. Throughout the exploration and iteration of our MOON, we have accumulated valuable insights and practical experience that we believe will benefit the research community. MOON contains a three-stage training paradigm of “Pretraining, Post-training, and Application”, allow ingeffective integration of multimodal representations with downstream tasks.Notably, to bridge the misalignment between the objectives of multimodal representation learning and downstream training, we define the exchange rate to quantify how effectively improvements in an intermediate metric can translate into downstream gains. Through this analysis, we identify the image-based search recall as a critical intermediate metric guiding the optimization of multimodal models. Over three years and five iterations, MOON has evolved along four critical dimensions: data processing, training strategy, model architecture, and downstream application. The lessons and insights gained through the iterative improvements will also be shared. As part of our exploration into scaling effects in the e-commerce field, we further conduct a systematic study of the scaling laws governing multimodal representation learning, examining multiple factors such as the number of training tokens, negative samples, and the length of user behavior sequences.

    Introduction

    The document introduces MOON, a set of sustainable iterative practices aimed at enhancing multimodal representation learning for e-commerce. MOON has been integrated into all stages of the Taobao search advertising system, contributing to a notable enhancement in CTR predictions (+20.00%). The report articulates the project’s observations—emphasizing the increasing importance of utilizing multimodal data (like images and videos alongside text) in CTR prediction, rather than relying solely on textual information.

    The authors historicize their investigation back to 2022, explaining their expectations about CTR predictions relying on multimodal understanding. They acknowledge initial challenges with existing end-to-end training approaches, leading to the development of a multi-stage, decoupled integration model that improves performance.

    Background

    1. Multimodal Content Integration: Users interact more meaningfully with visually engaging content, hence the need for integrating multimodal elements into models.
    2. End-to-End Paradigm Limitations: Initial tests with basic end-to-end approaches revealed deficiencies, prompting a shift toward a multi-stage methodology to enhance model performance.

    Findings and Contributions

    Key findings and contributions of the MOON report include:

    • Three-Stage Training Paradigm: The architecture follows a “Pretraining, Post-training, and Application” methodology.
    • Image-Based Search Recall: Identified as a critical intermediate performance metric guiding the training of multimodal models.
    • Iterative Improvements: Through five iterations, insights in data processing, training strategies, and model architecture were achieved.
    • Scalable Infrastructure: A dedicated infrastructure was developed to support the life cycle of multimodal representations, enhancing efficiency and real-time interactions.

    They also established scaling laws that govern representation learning in CTR models, examining factors such as training token counts, user behavior sequence lengths, and negative sample diversity. These have informed practical guidelines for optimizing training processes while also ensuring models can adapt effectively in real-world situations.

    Conclusion

    The MOON report concludes by summarizing the significant achievements of the MOON methodology, emphasizing its successful implementation across various stages of Taobao’s systems. It highlights the growth trajectory of the project and its implications for future work in enhancing all facets of e-commerce applications beyond CTR prediction. The insights derived are expected to inspire subsequent advancements in recommendation and advertising systems, further solidifying the link between advanced modeling techniques and e-commerce performance.

    Future Work Directions

    • Data Expansion: Plans to broaden data coverage for various scenarios and modalities.
    • Training Paradigms: Investigating multi-stage and multi-task training techniques.
    • Infrastructure Development: Enhancements aimed at improving training and inference efficiency for larger models.

    By sharing their iterative experiences, the authors hope to foster progress and collaboration within the research community, reaffirming the importance of multimodal representation learning in shaping the future of e-commerce.

  • Revolutionizing E-Commerce with AI: Automated Product Knowledge Graph Construction

    AI Agent-Driven Framework for Automated Product Knowledge Graph Construction in E-Commerce

    By Dimitar Peshevski, Riste Stojanov, Dimitar Trajanov

    DOI https://doi.org/10.48550/arXiv.2511.11017

    Abstract

    The rapid growth of e-commerce platforms has led to an overflow of unstructured product data, which poses challenges for information retrieval, recommendation systems, and data analytics. Knowledge Graphs, which are structured representations of data, are crucial for organizing this information. However, constructing product-specific Knowledge Graphs is often a manual and complex task. This paper presents an automated framework powered by Artificial Intelligence agents to create Knowledge Graphs using unstructured product descriptions. The proposed method is divided into three stages—ontology creation and expansion, ontology refinement, and Knowledge Graph population—utilizing Large Language Models. The evaluation on a dataset of air conditioner descriptions shows the framework’s high effectiveness, achieving over 97% property coverage and demonstrating its scalability for intelligent product data integration.

    Introduction

    E-commerce and retail platforms are generating significant amounts of unstructured product information, such as descriptions, specifications, and reviews. To utilize this data for applications like product recommendations and analytics, it must be structured into a machine-readable form. Knowledge Graphs help achieve this by representing entities (like products) and their relationships in a graph format.

    Despite their utility, creating Knowledge Graphs is typically a manual and labor-intensive process that requires domain-specific knowledge. This paper introduces an automated framework utilizing AI agents to construct Knowledge Graphs specifically for product domains. By employing Large Language Models, the framework automates the creation and refinement of product ontologies and directly generates Knowledge Graphs from product descriptions.

    Methodology

    The framework consists of three major stages:

    1. Ontology Creation and Expansion: The process starts by sampling product descriptions to identify essential ontology elements, like product classes and attributes, and organizing them into a structured format. This stage iteratively incorporates more product samples to expand the ontology by adding new classes or properties.

    2. Ontology Refinement: This stage enhances the initial ontology using the capabilities of Large Language Models. It addresses any issues of redundancy, generality, or clarity within the ontology to improve its usability and flexibility across different product types.

    3. Knowledge Graph Population: The last stage involves populating the Knowledge Graph with specific product data derived from the descriptions. This step generates RDF (Resource Description Framework) triples, which represent the relationships and attributes of products. The framework ensures the accurate representation of data without generating incorrect information.

    Evaluation

    The authors evaluated the framework on a dataset consisting of 291 product descriptions for air conditioners. The evaluation focused on three key areas:

    • Ontology Coverage: It measured how completely the ontology captured product classes, attributes, and relationships.
    • Ontology Quality: This involved a qualitative assessment of coherence, generality, and usability.
    • Knowledge Graph Population: They assessed the number of generated RDF triples and how many properties from the ontology were instantiated in the Knowledge Graph.

    The results showed that the framework constructed a modular and comprehensive ontology covering 42 classes and 69 properties. It processed 282 of the 291 descriptions, achieving a property coverage of 97.1%, demonstrating the framework’s effectiveness and robustness.

    Conclusion and Future Work

    The proposed AI agent-driven framework represents a significant advancement in automating the construction of Knowledge Graphs for e-commerce. It effectively eliminates the need for manual processes, allowing faster adaptability to new products.

    Future enhancements could include integrating various types of data (like images and user reviews) to enrich the Knowledge Graph further. Additionally, efforts could be directed towards improving the accuracy of data extraction and expanding the framework’s application to other domains, such as finance or healthcare.

    The framework promises to lay a strong foundation for advanced applications in e-commerce, such as improved product recommendations and search functionality.