Robin's Coding Story

Tech Hobbyist and A Passionate Python Engineer @robinignacio

August 11, 2024

PBA Scraper Using Playwright and Pandas

api

Part of the Photo not mine, Credits: PBA Site

This project is a robust web scraper designed to extract data from the PBA site with dynamic content. Built using Playwright Async Python and Pandas, this scraper effectively handles two distinct dynamic pages—one dedicated to Players and the other to Teams. Key Features:

  • Dynamic Page Handling: The scraper is tailored to navigate and extract data from pages with dynamic content, ensuring accurate and timely data collection.
  • Technology Stack: Leveraging the power of Playwright Async Python for efficient web scraping and Pandas for data manipulation, the project offers a seamless integration of scraping and data processing.
  • Data Storage: All scraped data is stored locally within the script directory, making it easy to access and manage.
  • Easy to Explore: To get started, simply clone the repository and explore the code. The project is structured to be intuitive and straightforward, allowing for easy modification and extension.
  • Whether you're looking to scrape dynamic web pages or understand the intricacies of using Playwright with Python, this project serves as a practical and insightful example.

    December 23, 2023

    Georgia Scraper using Playwright and Gspread

    api

    This scraping project was assigned by a client from the State of Georgia. The client is a real estate broker who needs foreclosure listings to use as leads. The data is from a public notice website that implements a Captcha to prevent scraping activities. This posed a challenge for our scraper bot. Additionally, there is a need for a regex-based method of collecting data from public notice documents, as these documents have a random format for addresses and owner information. I devised a solution using Playwright solvers that utilize audio processing to solve Google reCAPTCHA V2. The resulting cleaned data is uploaded to Google Sheets, making the Gspread API necessary for this application. To learn more, there is a video recording and documentation provided by the developer.

    Recorded demonstration is divided into 3 parts:

    PART I.

    PART II.

    PART III.

    October 21, 2023

    Google Sheet Data Merger with Unique Data worksheet

    api

    Automating the merging of brand files generated from the Locad Platform utilizes Google API and Python libraries. A Google API service account is necessary for this automation, and the script uses an organizational API. Multiple brands are processed by the users, usually around a hundred thousand rows covered, depending on the range of the report they need. The script runs on the Google Colab platform, where the user needs to "Make a copy" of the script and edit some inputs. A README document is provided for developers to test the script.

    Please view the recorded LOOM VIDEO.

    April 13, 2023

    Machine Learning Diabetes Prediction Deployed in Django Framework

    dataresult

    Diabetes is a common disease that affects millions of people worldwide. It leads to very destructive complications that we don't want to suffer. Early detection of this disease will surely help with prevention. Thanks to existing technology nowadays, we were able to predict a diabetic individual with just the existing data that was previously gathered. A machine learning algorithm was used to train and make predictions for this project. I used the Django framework for the deployment and to show a fully functional data form as independent and dependent variables. Please click to view the full Github documentation. repository.

    May 8, 2023

    Rangerchat: Django And Ajax Real Time Chatapp

    team

    Here is A Realtime Chatapp for you and your friends. The framework that I used was Django with Ajax feature for database updates. The users can create and join the chatroom with the benefits of seeing realtime messages without reloading the website. I'm going to integrate some of the code into a much larger project. Please check out this app. To run this on your server please check the documentation in the repository.

    September 20, 2021

    Automated Contract Generator Using
    Python Docxtpl

    This is a sample Software Fair Use contract form where all entries are automated thru a simple GUI. I used PySimpleGUI and docxtpl for the modules. This a very useful tool in data entry. Can be used in any docs like Microsoft word and Google docs.

    April 1, 2022

    Webscraping Using Python Beautifulsoup

    I've created an Automated web scraping to provide a solution for speeding up the data collection process. Once the code is written, it will get the information you want many times and from many pages. However, when you try to get the data you want manually, you might spend a lot of time clicking, scrolling, and searching, especially if you need large amounts of information from websites that are regularly updated with new content. Manual web scraping can take a lot of time and repetition. PLEASE ACCESS THE URL FOR THE SCRAPED WEBSITE HERE.

    March 2, 2023

    Hackerrank Solved Problems
    Python And SQL

    Highlighted here are the solved problems in Hackerrank for Python and SqL. Data type problems and making queries for database are the majority of the problems. Other coding exercises from other sites like Codility are also included in this tab.