bots_invasion_cyberattack

How Sharepon Battled and Evolved to Defend Against Bots

In this blog article, I’m going to share the dramatic stories of my battles against bots as Sharepon evolves. As a cautious developer who loves to prepare thoroughly, I made extensive efforts before launching my web app for the first time. Yet, despite my careful planning, I was still caught off guard. After Sharepon went live, it faced relentless attacks from bots, even though I hadn’t told anyone that my web app was online. I’d like to share my journey and the solutions I implemented, hoping these insights can serve as both a warning and an inspiration.

This story begins long before Sharepon’s official launch. Shortly after completing the development of Sharepon’s basic functionalities, I shifted into an enhancement phase to make the app more performant. At that time, I understood the importance of protecting the coupons members share from web crawlers and bots. Sharepon’s success depends on the goodwill of warm-hearted members who share coupons with others. That’s also why the platform allows members to upload their payment QR codes to their homepages, enabling them to receive tips as rewards and motivation. If bots scrape and repost these shared coupons elsewhere, members could lose out on both the recognition and potential tips they deserve.

The first solution that came to mind was to build a rate limiter. Rate limiting ensures users can make only a limited number of requests within a given timeframe—for example, viewing up to 100 coupon codes per hour. This discourages bots and bad actors from scraping and redistributing coupons.

At the time, I had already implemented Redis for caching, so I decided to leverage it for rate limiting. To track the number of requests a user makes within a specific time window (e.g., one hour), I associated each user’s IP address with a Redis key that stored their request count. For each incoming request, the count is incremented using Redis’s atomic INCR command, and an expiration time is set for the key to match the duration of the time window. If the count exceeded the predefined limit during the window, the request was rejected. This approach is simple, efficient, and leverages Redis’s in-memory data store and atomic operations to maintain reliability under high traffic. If you’re building your web app with Python Flask, I recommend checking out the flask_limiter package, which offers detailed documentation on rate limit setups, shared rate limits among resources, and more.

In my original design, I limited resource usage based on each visitor’s IP address. My logic was that it’s easier to switch between accounts than IP addresses, making IP-based limits more effective against bots. However, I soon realized this approach might harm legitimate users. For example, if students in a dormitory all use Sharepon, they would share the same public IP address as seen by the server, forcing them to split the same rate limit. Sharepon is a community app, and I believe user experience should always come first. As a result, I updated the rate limiting logic: for visitors without accounts, limits were still tied to IP addresses, but for logged-in users, limits were tied to their user IDs.

Another important tip for implementing rate limiting: ensure your rate limit errors’ error handling is clear and informative. Gracefully notify users when they hit a rate limit and explain what that limit is.

cartoon_of_man_fighting_against_bot

After implementing these changes, I moved on to other enhancements and eventually launched Sharepon several months later. Initially, I didn’t announce the launch, knowing I’d likely discover areas for improvement. I quietly tested the app myself and began refining it. To my surprise, my website was soon detected by bots and faced daily invasions.

By checking the backend database, I observed patterns in bot activity. Bots would interact with almost every element on the homepage, triggering a lot of search actions. They even registered accounts, using random strings as usernames, though the email addresses they provided seemed valid.

Fortunately, Sharepon requires email verification for account activation. Users must log into the email they use for registration and click a verification link to prove ownership. While this feature was designed to prevent misuse of others’ email addresses, it also turned out to be an effective barrier against bots. Out of the 1,000+ accounts registered by bots within two months, none passed the email verification step. Bots either didn’t own the email accounts they registered with or weren’t programmed to handle the email verification process.

However, there was an unintended consequence. Sharepon was sending out a large volume of emails daily, some of which went to valid addresses that hadn’t actually registered with the platform. Some recipients marked these emails as spam, damaging Sharepon email accounts’ legitimacy. Over time, nearly all emails sent by Sharepon started landing in users’ spam folders instead of their inboxes—a major setback for communication with our users.

To further protect Sharepon, I integrated Google’s reCAPTCHA v3, which distinguishes between human and bot activity. Actions like login and registration now send API requests to Google for verification. Only if the reCAPTCHA test is passed does the Sharepon server process the request. I chose reCAPTCHA v3 because it operates silently in the background, avoiding challenges like deciphering distorted text or matching images — tasks that can be frustrating even for humans. For those hosting web applications on Cloudflare, Cloudflare Turnstile solution is another great alternative to explore.

Another simple yet effective method I implemented as well was to include a robots.txt file. This file provides instructions to well-behaved web crawlers, specifying which parts of the site they are allowed to access. For example, I can disallow crawlers from scraping coupon pages or other resource-intensive sections of Sharepon. However, it’s important to note that robots.txt relies on crawler compliance as it won’t stop malicious bots that choose to ignore it. While not a standalone solution, it’s a helpful additional step in managing server load caused by web crawlers.

Looking back, if I had known all this from the start, I would have implemented reCAPTCHA right away. While the journey was challenging, it taught me valuable lessons about vigilance and adaptability. Battling bots isn’t just about protecting your app. It’s about preserving the trust and experience of your community.

If you’re building a web app, I hope my story inspires you to take proactive steps against bots from the beginning. Every app’s battle will be unique, but the key is to learn, adapt, and always prioritize your users’ experience.