How to Use Akismet to Combat Search Query Spam

Keyword Spam in Google Search Console
Keyword Spam in Google Search Console

One thing malicious users like to do is create a directory of spam links to your site filled with search parameters. Depending on the site, this search query may be displayed, and if not properly escaped, the spammer will have successfully inserted a link onto your site.

Even if you’re protected from malicious search queries, it’s still annoying that someone is trying to hijack your search results. Thankfully, there are ways to do a spam check on these queries using Akismet.

Akismet is great for comment spam; at least, that’s what it is most known for. But you can use Akismet to check for other types of spam. For example, you can use Akismet to check for form spam.

In this particular case, I want to check the search queries for spam. Akismet provides an API endpoint called Comment check, which I have modified to check for search query spam.

Without further delay, here is the code I used to eliminate most of the spam queries.

<?php
/**
 * Plugin Name:       Akismet Search Check
 * Plugin URI:        https://mediaron.com
 * Description:       Filter search terms through Akismet
 * Version:           1.0.0
 * Requires at least: 6.0
 * Requires PHP:      7.3
 * Author:            Ronald Huereca
 * Author URI:        https://mediaron.com
 * License:           GPL v2 or later
 * License URI:       https://www.gnu.org/licenses/gpl-2.0.html
 *
 * @package AkismetSearchCheck
 */

// Set to false to effectively disable this plugin and not check for spam.
define( 'MR_AKISMET_SEARCH_CHECK_ENABLED', true );

// MAX length of a search query. Anything above this limit will be checked. Set to zero to check everything.
define( 'MR_AKISMET_SEARCH_CHAR_MAX_LENGTH', 30 );

// Anything greater than this length will be rejected. Set to zero to disable.
define( 'MR_AKISMET_SEARCH_CHAR_REJECT_LENGTH', 75 );

// Skip Logged-in Users. Set to false to check logged-in users.
define( 'MR_AKISMET_SEARCH_SKIP_LOGGED_IN_USERS', true );

// Akismet content type. This is used to identify the content type in Akismet.
define( 'MR_AKISMET_CONTENT_TYPE', 'search-query' );

/**
 * Class to handle search redirects. Class used to avoid global vars.
 */
class MR_Search_Redirect {

	/**
	 * A variable to check if we've already validated to search.
	 *
	 * @var bool
	 */
	public static $already_checked = false;

}
// Exit early if disabled.
if ( ! MR_AKISMET_SEARCH_CHECK_ENABLED ) {
	return;
}

/**
 * Get a user's IP address.
 */
function mr_search_get_user_ip() {
	if ( array_key_exists( 'HTTP_X_FORWARDED_FOR', $_SERVER ) && ! empty( $_SERVER['HTTP_X_FORWARDED_FOR'] ) ) {
		if ( strpos( $_SERVER['HTTP_X_FORWARDED_FOR'], ',' ) > 0 ) {
			$addr = explode( ',', $_SERVER['HTTP_X_FORWARDED_FOR'] );
			return trim( $addr[0] );
		} else {
			return $_SERVER['HTTP_X_FORWARDED_FOR'];
		}
	} else {
		return $_SERVER['REMOTE_ADDR'];
	}
}

// Check search query for spam with Akismet.
add_action(
	'pre_get_posts',
	function ( $query ) {
		// REST API exclusion check.
		if ( defined( 'REST_REQUEST' ) && REST_REQUEST ) {
			return;
		}
		// Skip if doing AJAX search (e.g., Jetpack Search).
		if ( defined( 'DOING_AJAX' ) && DOING_AJAX ) {
			return;
		}
		// Skip in the admin panel searches.
		if ( is_admin() ) {
			return;
		}
		// Skip logged-in users.
		if ( is_user_logged_in() && MR_AKISMET_SEARCH_SKIP_LOGGED_IN_USERS ) {
			return;
		}

		// Get the search query, if any.
		$search_query = $query->get( 's' );

		// Are we in a search query?
		if ( ! empty( $search_query ) ) {
			// We are in a search query.
			// Check to see if we've already checked.
			if ( MR_Search_Redirect::$already_checked ) {
				return;
			}

			// Set checked WP_Query var.
			MR_Search_Redirect::$already_checked = true;

			// Check if it's over a certain number of characters. If so, it's likely malicious.
			if ( 0 < MR_AKISMET_SEARCH_CHAR_REJECT_LENGTH && strlen( $search_query ) >= MR_AKISMET_SEARCH_CHAR_REJECT_LENGTH ) {
				// Redirect to reject URL.
				status_header( 500, 'Cannot process search request.' );
				exit;
			}

			// If less than limit, we can return early and not check the query.
			if ( strlen( $search_query ) <= MR_AKISMET_SEARCH_CHAR_MAX_LENGTH && 0 !== MR_AKISMET_SEARCH_CHAR_MAX_LENGTH ) {
				return;
			}

			// Check akismet for spam.
			if ( class_exists( 'Akismet' ) ) {
				$akismet_api_key = \Akismet::get_api_key();
				if ( $akismet_api_key ) {
					$akismet_fields                         = array();
					$akismet_fields['blog']                 = esc_url( home_url() );
					$akismet_fields['comment_type']         = 'search_query';
					$akismet_fields['comment_content']      = sanitize_text_field( $search_query );
					$akismet_fields['contact_form_subject'] = sanitize_text_field( $search_query );
					$akismet_fields['comment_author_IP']    = mr_search_get_user_ip();
					$akismet_fields['user_ip']              = mr_search_get_user_ip();
					$akismet_fields['referrer']             = sanitize_text_field( $_SERVER['HTTP_REFERER'] ?? '' );

					// Get all the fields and consolidate.
					$akismet_fields = http_build_query( $akismet_fields );

					// Submit spam check.
					$response = \Akismet::http_post( $akismet_fields, 'comment-check' );

					// Get spam response.
					$maybe_spam = (bool) filter_var( $response[1] ?? false, FILTER_VALIDATE_BOOLEAN );
					if ( $maybe_spam ) {
						// Set 500 error header.
						status_header( 500, 'Cannot process search request.' );
						exit;
					}
				}
			}
		}
	},
	1 /* super high priority */
);
Code language: PHP (php)

Let’s break the code down so you can know exactly how this works.

The constants

Constants are used in order for quick configuration. Let’s go over the constants used for the search query spam protection.

MR_AKISMET_SEARCH_CHECK_ENABLED

// Set to false to effectively disable this plugin and not check for spam.
define( 'MR_AKISMET_SEARCH_CHECK_ENABLED', true );Code language: PHP (php)

This is a top-level flag and, if false, will prevent any spam checking.

MR_AKISMET_SEARCH_CHAR_MAX_LENGTH

// MAX length of a search query. Anything above this limit will be checked. Set to zero to check everything.
define( 'MR_AKISMET_SEARCH_CHAR_MAX_LENGTH', 30 );Code language: PHP (php)

This is the number of characters that need to be entered before the search query is checked for spam. You can set this to zero (0) to check everything.

MR_AKISMET_SEARCH_CHAR_REJECT_LENGTH

// Anything greater than this length will be rejected. Set to zero to disable.
define( 'MR_AKISMET_SEARCH_CHAR_REJECT_LENGTH', 75 );Code language: PHP (php)

Anything greater than this length (75) will be automatically rejected.

MR_AKISMET_SEARCH_SKIP_LOGGED_IN_USERS

// Skip Logged-in Users. Set to false to check logged-in users.
define( 'MR_AKISMET_SEARCH_SKIP_LOGGED_IN_USERS', true );Code language: PHP (php)

If true, logged-in users will not be checked for search spam.

MR_AKISMET_CONTENT_TYPE

// Akismet content type. This is used to identify the content type in Akismet.
define( 'MR_AKISMET_CONTENT_TYPE', 'search-query' );Code language: PHP (php)

The Akismet content type should be a unique slug relevant to your site and what it’s trying to do. For example, for this site, I would use mediaron-com-search-query or something similar.

A static variable

In order to avoid using a global, I have created a class that will hold a result I need in a static class variable. In this particular case, I want to make sure that the search query isn’t checked multiple times per load.

/**
 * Class to handle search redirects. Class used to avoid global vars.
 */
class MR_Search_Redirect {

	/**
	 * A variable to check if we've already validated to search.
	 *
	 * @var bool
	 */
	public static $already_checked = false;

}Code language: PHP (php)

Exit early

If the search is disabled, there’s no need to load anything else. You can return early.

// Exit early if disabled.
if ( ! MR_AKISMET_SEARCH_CHECK_ENABLED ) {
	return;
}Code language: PHP (php)

Getting the user’s IP address

One piece of data that helps Akismet is the IP address of the user. I have created a function that will attempt to return this value.

/**
 * Get a user's IP address.
 */
function mr_search_get_user_ip() {
	if ( array_key_exists( 'HTTP_X_FORWARDED_FOR', $_SERVER ) && ! empty( $_SERVER['HTTP_X_FORWARDED_FOR'] ) ) {
		if ( strpos( $_SERVER['HTTP_X_FORWARDED_FOR'], ',' ) > 0 ) {
			$addr = explode( ',', $_SERVER['HTTP_X_FORWARDED_FOR'] );
			return trim( $addr[0] );
		} else {
			return $_SERVER['HTTP_X_FORWARDED_FOR'];
		}
	} else {
		return $_SERVER['REMOTE_ADDR'];
	}
}Code language: PHP (php)

Using pre_get_posts to capture the query

WordPress action pre_get_posts is an excellent way to capture or modify a WordPress query. In this case, I’ll be using it to determine if we’re in a search query, and if so, check it for spam.

First, I’ll need to do some sanity checks. I want to exclude REST requests, Ajax requests, admin search requests, and searches by logged-in users.

// Check search query for spam with Akismet.
add_action(
	'pre_get_posts',
	function ( $query ) {
		// REST API exclusion check.
		if ( defined( 'REST_REQUEST' ) && REST_REQUEST ) {
			return;
		}
		// Skip if doing AJAX search (e.g., Jetpack Search).
		if ( defined( 'DOING_AJAX' ) && DOING_AJAX ) {
			return;
		}
		// Skip in the admin panel searches.
		if ( is_admin() ) {
			return;
		}
		// Skip logged-in users.
		if ( is_user_logged_in() && MR_AKISMET_SEARCH_SKIP_LOGGED_IN_USERS ) {
			return;
		}
		/* more code here */
	},
	1 /* super high priority */
);Code language: PHP (php)

In the code above, I do the checks to make sure that it’s only running for the main search query.

Now we can get the search query and check its length to see if it’s suitable for checking with Akismet.

// Get the search query, if any.
$search_query = $query->get( 's' );

// Are we in a search query?
if ( ! empty( $search_query ) ) {
	// We are in a search query.
	// Check to see if we've already checked.
	if ( MR_Search_Redirect::$already_checked ) {
		return;
	}

	// Set checked WP_Query var.
	MR_Search_Redirect::$already_checked = true;

	// Check if it's over a certain number of characters. If so, it's likely malicious.
	if ( 0 < MR_AKISMET_SEARCH_CHAR_REJECT_LENGTH && strlen( $search_query ) >= MR_AKISMET_SEARCH_CHAR_REJECT_LENGTH ) {
		// Redirect to reject URL.
		status_header( 500, 'Cannot process search request.' );
		exit;
	}

	// If less than limit, we can return early and not check the query.
	if ( strlen( $search_query ) <= MR_AKISMET_SEARCH_CHAR_MAX_LENGTH && 0 !== MR_AKISMET_SEARCH_CHAR_MAX_LENGTH ) {
		return;
	}

	/* More code here */
}Code language: PHP (php)

If, for example, a search query is too long, then status_header is used to set a 500 error. You may want to adjust the error code set here, but if it is search engine spam, it should be forcefully rejected so that Google knows to delist any current spam entries it has in its index.

Next is checking the search query with Akismet.

// Check akismet for spam.
if ( class_exists( 'Akismet' ) ) {
	$akismet_api_key = \Akismet::get_api_key();
	if ( $akismet_api_key ) {
		$akismet_fields                         = array();
		$akismet_fields['blog']                 = esc_url( home_url() );
		$akismet_fields['comment_type']         = MR_AKISMET_CONTENT_TYPE;
		$akismet_fields['comment_content']      = sanitize_text_field( $search_query );
		$akismet_fields['contact_form_subject'] = sanitize_text_field( $search_query );
		$akismet_fields['comment_author_IP']    = mr_search_get_user_ip();
		$akismet_fields['user_ip']              = mr_search_get_user_ip();
		$akismet_fields['referrer']             = sanitize_text_field( $_SERVER['HTTP_REFERER'] ?? '' );

		// Get all the fields and consolidate.
		$akismet_fields = http_build_query( $akismet_fields );

		// Submit spam check.
		$response = \Akismet::http_post( $akismet_fields, 'comment-check' );

		// Get spam response.
		$maybe_spam = (bool) filter_var( $response[1] ?? false, FILTER_VALIDATE_BOOLEAN );
		if ( $maybe_spam ) {
			// Set 500 error header.
			status_header( 500, 'Cannot process search request.' );
			exit;
		}
	}
}Code language: PHP (php)

The Akismet API is set using the Comment Check API. I’m passing the search query as both the body and subject and am passing the custom content type (comment type).

If spam is detected, a 500 error is initiated.

Will this slow down your site?

TLDR: it shouldn’t.

The thing that will slow down the load on your site is when the search term is sent to Akismet for spam checking. I’ve found that through my tests, this can be minimized by setting sane minimum and maximum character checks.

That’s it! Comments?

Please check out the Gist for this and please leave any comments you might have below.

Ronald Huereca
Ronald Huereca

Ronald Huereca

Ronald has been part of the WordPress community since 2006, starting off writing and eventually diving into WordPress plugin development and writing tutorials and opinionated pieces.

No stranger to controversy and opinionated takes on tough topics, Ronald writes honestly when he covers a topic.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Ronald Huereca
MediaRon - Ronald Huereca

Ronald created MediaRon in 2011 and has more than fifteen years of releasing free and paid WordPress plugins.

Quick Links

Say Hi