About Us

About MIMIC Lab

The Multimodal Interactive Machine Intelligence Creation Laboratory is dedicated to advancing human-interactive, multimodal AI.

Our focus is on creating AI that can understand, communicate with, and empathize with humans.

Taehoon Kim

Education

2018 - 2021 Ph.D (M.S integrated) in Computer Science, Sogang University

2012 - 2018 B.S in Computer Science & Communications, Sogang University

Career

Aug 2024 - Current Assistant Professor, Graduate School of Metaverse, Sogang University

Mar 2021 - Aug 2024 Research Scientist, Vision Lab, LG AI Research

Feb 2020 - Jan 2021 Research Intern, Clova AI, Naver Corp.

Jan 2017 - Dec 2017 Machine Learning Engineer, Nosith Inc.

Field of Interest

• General machine learning, computer-vision, and large scale model training.

• Specialized in large multimodal model (LMM), vision-language, quantization, and network architecture design.

• Application of machine learning algorithms on various multimodal and computer vision tasks.

Projects

Reliable Egocentric Multimodal AI Agent (NRF Early Career Research)

• Research supported by National Research Foundation Early Career Research Program (우수신진연구).

• Developing a hallucination-free on-device egocentric multimodal AI agent with self-correction capabilities, designed to operate in real-world human environments.

• Proposing an Actor-Validator architecture with RLAIF-based alignment, where a lightweight actor generates responses and a high-capacity validator evaluates factual consistency, logical coherence, and social appropriateness.

• Designing a Social-Context Hallucination Benchmark to quantitatively measure and reduce hallucinations in egocentric multimodal settings, targeting hallucination rates below 10%.

• Developing multimodal data augmentation pipelines using large multimodal models to improve robustness in rare and socially complex scenarios.

• Implementing Dynamic Precision Quantization (DPQ) and Quantization-Aware Training (QAT) to enable real-time on-device inference under strict resource constraints (≤4GB memory, ≤100ms latency).

• Integrating privacy-preserving mechanisms based on latent-space anonymization to ensure safe handling of egocentric visual and audio data.

• Targeting deployment on edge devices such as Jetson Orin and NPU platforms, enabling fully on-device, privacy-preserving, and low-latency AI agents for AR/XR and wearable applications.

Reliable Generative AI via Validator LLM

• Academic research supported by NVIDIA Academic Grant Program.

• Developing a Validator LLM framework to evaluate and enforce logical consistency in generative AI outputs, addressing hallucination and reasoning errors in large language models.

• Designed a dual-model architecture where an actor LLM generates responses and a validator LLM assesses reasoning validity, enabling iterative refinement through reinforcement learning from AI feedback (RLAIF).

• Exploring multi-pass reasoning and cross-model verification to improve robustness and trustworthiness of generated explanations.

• Implementing the system using NVIDIA AI stack including NeMo Framework and TensorRT-LLM for scalable and production-ready deployment.

Egocentric Multimodal AI Agent

• Academic Partnership with Project Aria, Meta Reality Lab. (On-going)

• Developing an Egocentric Multimodal AI Agent leveraging real-time visual inputs from Aria Glass, integrating camera streams, Visual SLAM, and eye-tracking data to enable personalized and context-aware interactions.

• Designing end-to-end multimodal AI architecture optimized for egocentric perception, combining Large Multimodal Models (LMM) with Speech-to-Text (STT) and Text-to-Speech (TTS) for immersive, real-world applications.

Large Multimodal Model (LMM)

• Lead of Image-to-Text LMM (EXAONE Atelier Image-to-text) Project.

• Developed Bidirectional Image-Text Transformer architecture for efficient large-scale vision-language model training.

• Optimized model inference and corresponding backend architecture for commercialization.

• Designed end-to-end backend architecture for general-purpose multimodal agent (EXAONE Atelier Multimodal) by integrating large multimodal model (LMM) and large language model (LMM) with instruction prompt engineering.

• Cooperative project with CLOVA AI, Naver Corp.

• Developed a straightforward optimization methods StatAssist & GradBoost which enables the scratch quantization-aware-training in various computer vision tasks : classification, object detection, semantic segmentation, and style transfer.

• Experiments on various tasks showed comparable or often better performance than their floating-point baselines.

Privacy Preserving Image Anonymization

• Project supported by the Institute for Information and Communications Technology Promotion (IITP) Grant funded by the Korea Government (MSIT) (A Development of Deidentification Technique Based on Differential Privacy)

• Developed a latent-space-level image anonymization framework (PPAPNet & PPSGAN) based on Generative Adversarial Networks (GANs) and Differential Privacy to potentially protect images from Model Inversion Attacks.

• Experiments on various datasets showed that PPAPNet & PPSGAN can effectively convert a sensitive image into a high-quality and attack-immune synthetic image while preserving its utility as training data.

Address

GA507A, 35, Baekbeom-ro
Mapo-gu, Seoul 04107
Republic of Korea