WP_REST_URL_Details_Controller::get_meta_with_content_elements( string $html ): array

In this article

This function’s access is marked private. This means it is not intended for use by plugin or theme developers, only in other core functions. It is listed here for completeness.

Gets all the meta tag elements that have a ‘content’ attribute.

Parameters

$htmlstringrequired
The string of HTML to be parsed.

Return

array A multi-dimensional indexed array on success, else empty array.
  • 0 string[]
    Meta elements with a content attribute.
  • 1 string[]
    Content attribute’s opening quotation mark.
  • 2 string[]
    Content attribute’s value for each meta element.

Source

private function get_meta_with_content_elements( $html ) {
	/*
	 * Parse all meta elements with a content attribute.
	 *
	 * Why first search for the content attribute rather than directly searching for name=description element?
	 * tl;dr The content attribute's value will be truncated when it contains a > symbol.
	 *
	 * The content attribute's value (i.e. the description to get) can have HTML in it and be well-formed as
	 * it's a string to the browser. Imagine what happens when attempting to match for the name=description
	 * first. Hmm, if a > or /> symbol is in the content attribute's value, then it terminates the match
	 * as the element's closing symbol. But wait, it's in the content attribute and is not the end of the
	 * element. This is a limitation of using regex. It can't determine "wait a minute this is inside of quotation".
	 * If this happens, what gets matched is not the entire element or all of the content.
	 *
	 * Why not search for the name=description and then content="(.*)"?
	 * The attribute order could be opposite. Plus, additional attributes may exist including being between
	 * the name and content attributes.
	 *
	 * Why not lookahead?
	 * Lookahead is not constrained to stay within the element. The first <meta it finds may not include
	 * the name or content, but rather could be from a different element downstream.
	 */
	$pattern = '#<meta\s' .

			/*
			 * Allows for additional attributes before the content attribute.
			 * Searches for anything other than > symbol.
			 */
			'[^>]*' .

			/*
			* Find the content attribute. When found, capture its value (.*).
			*
			* Allows for (a) single or double quotes and (b) whitespace in the value.
			*
			* Why capture the opening quotation mark, i.e. (["\']), and then backreference,
			* i.e \1, for the closing quotation mark?
			* To ensure the closing quotation mark matches the opening one. Why? Attribute values
			* can contain quotation marks, such as an apostrophe in the content.
			*/
			'content=(["\']??)(.*)\1' .

			/*
			* Allows for additional attributes after the content attribute.
			* Searches for anything other than > symbol.
			*/
			'[^>]*' .

			/*
			* \/?> searches for the closing > symbol, which can be in either /> or > format.
			* # ends the pattern.
			*/
			'\/?>#' .

			/*
			* These are the options:
			* - i : case insensitive
			* - s : allows newline characters for the . match (needed for multiline elements)
			* - U means non-greedy matching
			*/
			'isU';

	preg_match_all( $pattern, $html, $elements );

	return $elements;
}

Changelog

VersionDescription
5.9.0Introduced.

User Contributed Notes

You must log in before being able to contribute a note or feedback.