Summary
Overview
Work History
Education
Skills
Timeline
Generic

Hak-kyu Kim

Web Scraping Engineer
Icheon

Summary

Dynamic Data Engineer with a proven track record at Remember & Company, specializing in designing efficient ETL pipelines and advanced web scraping solutions. Expert in Python programming and adept at optimizing data workflows, I enhance data integrity and streamline processes, ensuring reliable data collection and impactful results.

Overview

7
7
years of professional experience

Work History

Data Engineer

Remember & Company
04.2024 - Current
  • Implemented Apache Airflow for scheduling, managing, and monitoring business data scraping workflows. Enhanced data processing efficiency by utilizing PySpark within the AWS Glue environment to transform the scraped data.
  • Proficient in web data scraping using libraries such as curl_cffi, Playwright, and SeleniumBase. Frequently deployed these scrapers as AWS Lambda functions using Docker container images, managing the infrastructure with serverless framework and AWS SAM.
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Simplified complex data extraction tasks by building efficient web scraping bots using advanced selectors and pattern matching techniques.
  • Streamlined data collection processes by implementing efficient web scraping techniques and tools.
  • Python, Apache Airflow, AWS Glue, AWS Lambda
  • https://corp.remember.co.kr/

Web Scraping Engineer

SpaceOddity
10.2022 - 03.2024
  • Developed an efficient web scraping bot to collect various activity metrics for K-pop artists from multiple sources, including Instagram, Twitter (formerly X), and YouTube. Also responsible for monitoring and maintaining the bot to ensure reliable data collection.
  • Streamlined data collection process for improved efficiency by automating web scraping tasks using Python.
  • Python, AWS Lambda
  • https://www.spaceoddity.me/

Web Crawler Developer

MISSGO Inc
03.2022 - 10.2022
  • Engineered web scraping solutions for the purpose of collecting comprehensive real estate data within the Korean market.
  • Python, AWS Lambda, node.js
  • https://www.missgoauction.com/?main=home&screen=wide

Software Engineer

AUTO MARKETING
08.2020 - 01.2022
  • Investigated and developed innovative web automation programs to enhance Search Engine Optimization (SEO) strategies.
  • Designed and implemented a C# Windows GUI application for data collection, providing users with customizable options and direct execution control.
  • Evaluated the feasibility of data web scraping using Python, implemented a proof-of-concept, and subsequently converted the logic to C# for use in the GUI application.
  • Python, C#
  • https://www.automarketing.co.kr/

Software Developer

UpennSolution
12.2018 - 04.2020
  • Led a project to develop data scraping programs targeting a broad market segment comprising government, corporate, and individual clients.
  • Participated in the early stages of developing a web scraping SaaS platform and concentrated on building the core logic to deliver standardized output to clients regardless of input variations.
  • Python
  • https://blog.spiderkim.com/

Education

Bachelor of Arts - Business

Academic Credit Bank System
Seoul
04.2001 -

No Degree - Data Engineering Nanodegree

Udacity
Mountain View, California
04.2001 -

Skills

Web Scraping

Data ETL pipeline design

Expert Python programming

Network Packet Analysis

Timeline

Data Engineer

Remember & Company
04.2024 - Current

Web Scraping Engineer

SpaceOddity
10.2022 - 03.2024

Web Crawler Developer

MISSGO Inc
03.2022 - 10.2022

Software Engineer

AUTO MARKETING
08.2020 - 01.2022

Software Developer

UpennSolution
12.2018 - 04.2020

Bachelor of Arts - Business

Academic Credit Bank System
04.2001 -

No Degree - Data Engineering Nanodegree

Udacity
04.2001 -
Hak-kyu KimWeb Scraping Engineer