Summary
Overview
Work History
Education
Skills
Certification
Awards
Timeline
Generic

IL-SEOK HAN

Summary

Dynamic and innovative Applied AI Scientist with over 3 years of experience in designing and implementing cutting-edge AI/ML solutions within the B2B SaaS domain. Adept at developing advanced machine learning models, including transformers and generative AI, with a deep understanding of NLP, data processing pipelines, and autonomous driving algorithms. Proven ability to lead research projects, optimize AI environments, and collaborate effectively across multidisciplinary teams to refine product strategies and improve operational efficiency. Strong foundation in both theoretical and applied aspects of AI, with a passion for integrating emerging technologies into practical business solutions. Recognized for significant contributions to product development, anomaly detection systems, and multi-map path-finding algorithms, backed by multiple awards and recognitions in AI and autonomous driving challenges.

Overview

5
5
years of professional experience
2
2
Certifications

Work History

Applied AI Scientist

Doodlin
3 2022 - Current

1. Research Process and AI Development Environment Setup


1) I am one of the early members of Doodlin. Since its inception, there has been no one dedicated to ML/AI research at the company, and the necessary environment and processes were not established. This lack of infrastructure not only hampers work efficiency but also makes it difficult to correct the research direction if it goes astray.


2) To address this issue, I have continuously conducted paper reviews with our CTO, who has relatively less knowledge in AI/ML technologies. Additionally, I have actively communicated with PO/PMs regarding the potential impacts, risks, feasible, and infeasible plans of AI/ML-based probabilistic models on our products. This collaboration has allowed me to gather diverse perspectives on research direction from various positions, which has been instrumental in refining our research focus.


3) Furthermore, I designed task metrics that are closely related to product performance, including both low-level metrics like F1 score and high-level metrics linked to actual user satisfaction. Although we have not yet achieved MLOps-level automation, establishing a process for testing new models is a significant milestone.


2. 2D Structured LLM Research

- Resume Key Information Extraction

- Sensitive Information Masking within Resumes


1) Resumes and similar documents pose challenges for LLMs (Large Language Models) to understand compared to plain text because the reading order and meaning can vary depending on the document’s layout. To address these challenges, I conducted research based on models like LayoutLM and BROS.


2) Additionally, resumes often contain a significant number of tokens, sometimes exceeding 4000-8000 in a single document. Traditional transformer-based models struggle with such documents due to computational and memory costs increasing quadratically with the number of tokens. To mitigate this, I researched models based on Longformer. I also re-trained the tokenizer to develop a vocabulary more efficient for resumes, reducing the required number of tokens.


3) Taking into account the specific characteristics of resumes and the previous research, I designed an original transformer model named "Wideformer," which was used to perform these tasks.


4) With these functionalities, HR managers can now upload resumes and register candidates without manually typing in the information. Moreover, the system allows for the extraction of structured data like work experience and education from unstructured resume data, enabling more effective candidate filtering and analysis.


3. Recommendation Algorithm

- Acceptance Rate Prediction

- Resume/Job Matching


1) I researched methods to learn correlations by utilizing embeddings generated by LLMs with Deep Metric Learning and Ranking models. Although still in the experimental phase, we are preparing to implement these features in our products.


4. Generative AI

- Job Posting and Sourcing Message Generation


1) I performed fine-tuning on the LLaMA 3 8B model using quantization and techniques like LoRA. This resulted in successfully generating natural job postings and sourcing messages.


2) However, what customers desire are job postings that attract more candidates and sourcing messages that receive better responses. Therefore, I am currently conducting research to fine-tune the model using an objective function aligned with these goals.


5. Technology Internalization

- Detection of Personal and Sensitive Information

I conducted research on detecting sensitive information within resumes using token-level classification.

- OCR


1) For resumes in image format, text is extracted using OCR. While we initially used ClovaOCR, we switched to using the Paddle OCR (an open-source library) for Text Detection, based on the EAST model, due to its already high accuracy. For Text Recognition, we trained a custom model using DtrOCR.


2) The model demonstrated accuracy levels similar to ClovaOCR at high resolutions but significantly improved performance in low-resolution scenarios, with accuracy increasing from 88.45% to 98.51%. For more complex texts, such as those with emails, accuracy increased from 23.08% to 97.64%.

Software Engineer

Bear Robotics
02.2020 - 12.2020

1. Autonomous Driving Algorithm Development
- Multi Map Pathfinding Algorithm Development

1) In a situation where a low-level pathfinding algorithm was already implemented, I developed the Multi Map functionality by merging multiple maps according to their orientations and then finding the route. Although the basic idea was provided by a senior developer, I encountered and resolved issues related to misaligned coordinates during the merging process.

2. Data Loading and Analysis
- Driving Data Loading Pipeline Setup

1) To ensure the driving data logs were preserved in cases where the robot went offline or the power was turned off during operation, I wrote code that logged the driving data locally on the robot using system calls. This code also ensured that the logs were not duplicated or lost when the robot was rebooted.

2) I set up a pipeline to aggregate this data into a driving data table and designed the schema for this table.

- Anomaly Detection Algorithm Based on Driving Data

1) I applied algorithms like IQR (Interquartile Range) and LOF (Local Outlier Factor) to the driving data table mentioned above. I conducted research and comparisons of applicable algorithms and implemented the most suitable ones. This enabled Robot Operators to detect robots that were suspected of having issues.

Education

Bachelor of Science - Physics, Computer Science Double Major

Sungkyunkwan University
Seoul, South Korea
04.2001 -

Skills

PyTorch

Machine Learning

NLP

NumPy

Pandas

Python

Git

Elasticsearch

AWS

Certification

Software Maestro 12th Cohort, Ministry of Science and ICT

Awards

  • Blockchain Grand Challenge, Special Award, 2022-11
  • Ministry of Science and ICT, Excellence Award, Space Weather Disaster AI Competition, 2019-10
  • PAMS University Autonomous Driving Car Competition, Excellence Award, 2019-09
  • Autonomous Driving Algorithm Hackathon, Excellence Award, 2019-06
  • Campus Town Startup Competition, Grand Prize, 2019-05
  • Sharing Economy Hackathon, Grand Prize, 2019-02

Timeline

Method for Generating Verifiable Random Numbers, Patent Application

09-2023

Software Maestro 12th Cohort, Ministry of Science and ICT

12-2021

Software Engineer

Bear Robotics
02.2020 - 12.2020

Bachelor of Science - Physics, Computer Science Double Major

Sungkyunkwan University
04.2001 -

Applied AI Scientist

Doodlin
3 2022 - Current
IL-SEOK HAN