Contact Us
Blog

Analyzing Phishing Infrastructure and Attack Patterns Using Daily Malicious Phishing Data

We analyze phishing infrastructure and operational patterns based on the Daily-Mal-Phishing dataset and demonstrate how this GitHub...

Phishing remains one of the most persistent and practical cybersecurity threats today. Attackers no longer rely on a single infrastructure; instead, they continuously rotate domains, platforms, and content to evade detection.

So what does real-world phishing infrastructure actually look like right now? To reflect this evolving landscape, Criminal IP provides a publicly available dataset through the Daily-Mal-Phishing repository, offering daily updated phishing URL samples at no cost. While this dataset represents only a portion of the broader threat intelligence collected, it is still sufficient to reveal how modern phishing campaigns are structured and operated. More importantly, the dataset can be downloaded and analyzed immediately without any complex setup, making it a practical starting point for security beginners or anyone looking to explore data-driven threat analysis.

In this article, we analyze phishing infrastructure and operational patterns based on the Daily-Mal-Phishing dataset and demonstrate how this GitHub repository can be effectively used in real-world analysis.

What is the Daily-Mal-Phishing Repository?

Daily-Mal-Phishing is a publicly available threat intelligence dataset that provides malicious and phishing URLs collected via Criminal IP Domain Search on a daily basis.

  • Update frequency: Daily
  • Data source: Criminal IP Domain Search (Dangerous / Critical URLs)
  • Format: CSV / TXT files by date
  • Cost: Free (sample dataset)

Rather than being a simple reference list, it reflects actively detected phishing infrastructure, making it useful for tracking ongoing threat trends.

Daily-Mal-Phishing dataset summary
  • Analysis period: September 2024 – April 2026
  • Total URLs: 52,670
  • Unique domains: 51,126
  • Unique countries: 91

Despite being a public sample dataset, it includes meaningful metadata such as domain, country, and registration time, allowing for structured analysis of phishing infrastructure rather than just reviewing raw URLs.

Key Findings

Monthly Phishing URL Trend (2024.09 ~ 2026.04)

Across most months, approximately 2,500 to 3,000 phishing URLs were consistently observed. This indicates that phishing is not a one-time, event-driven activity, but rather an ongoing operational model.

Even when one campaign ends, attackers quickly shift to new brands, domains, and hosting environments. The sustained volume in April 2026, exceeding 3,000 URLs, reinforces that phishing campaigns are continuously maintained rather than sporadic.

2. TLD and Domain Generation Strategy

TLD Distribution and Top Domains

A Top-Level Domain (TLD) refers to the last segment of a domain name (e.g., .com, .app). In phishing analysis, TLD distribution provides insight into how attackers generate and manage domains.

Top 5 TLDs observed:

.com → 11,333
.bond → 8,454
.app → 5,926
.dev → 5,399
.click → 2,610

While .com remains dominant, lower-cost and easily registrable TLDs such as .bond and .click are heavily utilized. This suggests that attackers prioritize scalability and speed, generating large numbers of domains and rotating them frequently to evade detection.

Modern phishing operations are no longer centered around a single domain but instead rely on a domain pool strategy, where multiple short-lived domains are continuously cycled.

3. Platform-Based Phishing Infrastructure

Phishing URLs by Platform

Analyzing which platforms phishing URLs are actually hosted on provides a clearer view of the attacker’s infrastructure strategy. In this dataset, legitimate platforms such as Vercel, GitHub Pages, Blogger, Wix Studio, and Framer were repeatedly used to host phishing URLs.

The distribution shows 3,519 cases on Vercel, 1,946 on GitHub Pages, 1,088 on Blogger, 883 on Wix Studio, and 465 on Framer. This indicates that attackers are no longer relying solely on setting up and operating their own servers. Instead, they leverage trusted platforms to quickly deploy pages, use built-in HTTPS and CDN environments to reduce user suspicion, and, when necessary, redeploy new phishing pages on the same platforms to maintain operational efficiency.

This is also an important point for beginners in cybersecurity. Phishing sites are often assumed to be hosted on suspicious servers or unknown hosting environments, but in reality, familiar and legitimate platforms are frequently abused. Therefore, rather than judging based only on the appearance of a URL, it is important to develop the habit of considering the underlying platform and infrastructure as well.

4. Geographic Distribution of Infrastructure

Top 10 Countries Hosting Phishing URLs

In terms of geographic distribution, the United States accounted for the overwhelming majority with 26,113 cases, followed by Hong Kong (2,079), Germany (2,071), Singapore (2,029), and Japan (514).

However, these figures do not directly indicate the actual location of the attackers. Rather, they more accurately reflect where global cloud and hosting infrastructure is concentrated. The United States hosts a large portion of major CDN, SaaS, static hosting, and cloud services, making it an environment where attackers can easily achieve both high availability and credibility. As a result, phishing infrastructure is frequently observed to be centered in the U.S., but this is less about the attackers’ origin and more about the operational environment they choose.

This interpretation is also important in practice. Instead of concluding “which country the attacker is from” based solely on geographic data, it is more effective to focus on which regions and types of infrastructure are being repeatedly abused.

From GitHub Data to Real-World Analysis

Daily-Mal-Phishing에서Phishing URLs related to Netflix identified in Daily-Mal-Phishing

By filtering the Daily-Mal-Phishing dataset using keywords such as “Netflix,” hundreds of phishing URLs can be identified. While this helps reveal attack patterns, it does not provide deeper insight into infrastructure or risk context.

This is where Criminal IP Domain Search becomes valuable.

Example of a phishing domain analyzed in Criminal IP Domain Search

When a URL from the GitHub dataset is entered into Domain Search, it expands from a simple string into a comprehensive intelligence profile. This includes risk classification (e.g., Critical or Dangerous), domain metadata such as registration time and country, associated IP addresses, infrastructure details, SSL configuration, and underlying technologies. In some cases, a preview of the actual webpage is also available, enabling analysts to quickly understand how the phishing page operates.

The analysis can then extend beyond a single domain. By pivoting on the associated IP address using Criminal IP Asset Search, additional related assets hosted within the same infrastructure can be identified. This allows analysts to move from a single indicator to a broader infrastructure-level investigation.

In practice, this creates a clear analytical workflow: using the GitHub dataset to identify patterns, validating individual domains through Domain Search, and expanding the investigation to infrastructure through Asset Search. This demonstrates how publicly available data and operational analysis tools can be effectively combined.

Conclusion

The Daily-Mal-Phishing dataset is not just a simple public sample, but a practical entry point for quickly observing real-world phishing infrastructure trends. Even as a free dataset updated daily, it provides enough visibility to understand current attack patterns, and its accessibility, without requiring any complex setup, is one of its biggest advantages.

For those just starting out with phishing analysis, this repository serves as an effective starting point to build a data-driven analytical mindset. From there, individual domains can be further analyzed using Criminal IP Domain Search and Phishing Search, enabling deeper insights that go beyond simple URL lists and extend into actual attack infrastructure.

The full code and dataset can be accessed directly from the GitHub repository below.

https://github.com/criminalip/Daily-Mal-Phishing