WP_REST_URL_Details_Controller::get_meta_with_content_elements() – Method

Gets all the meta tag elements that have a ‘content’ attribute.

Parameters

$htmlstringrequired: The string of HTML to be parsed.

Return

array A multi-dimensional indexed array on success, else empty array.

string[]
Meta elements with a content attribute.
1 string[]
Content attribute’s opening quotation mark.
2 string[]
Content attribute’s value for each meta element.

Source

private function get_meta_with_content_elements( $html ) {
	/*
	 * Parse all meta elements with a content attribute.
	 *
	 * Why first search for the content attribute rather than directly searching for name=description element?
	 * tl;dr The content attribute's value will be truncated when it contains a > symbol.
	 *
	 * The content attribute's value (i.e. the description to get) can have HTML in it and be well-formed as
	 * it's a string to the browser. Imagine what happens when attempting to match for the name=description
	 * first. Hmm, if a > or /> symbol is in the content attribute's value, then it terminates the match
	 * as the element's closing symbol. But wait, it's in the content attribute and is not the end of the
	 * element. This is a limitation of using regex. It can't determine "wait a minute this is inside of quotation".
	 * If this happens, what gets matched is not the entire element or all of the content.
	 *
	 * Why not search for the name=description and then content="(.*)"?
	 * The attribute order could be opposite. Plus, additional attributes may exist including being between
	 * the name and content attributes.
	 *
	 * Why not lookahead?
	 * Lookahead is not constrained to stay within the element. The first <meta it finds may not include
	 * the name or content, but rather could be from a different element downstream.
	 */
	$pattern = '#<meta\s' .

			/*
			 * Allows for additional attributes before the content attribute.
			 * Searches for anything other than > symbol.
			 */
			'[^>]*' .

			/*
			* Find the content attribute. When found, capture its value (.*).
			*
			* Allows for (a) single or double quotes and (b) whitespace in the value.
			*
			* Why capture the opening quotation mark, i.e. (["\']), and then backreference,
			* i.e \1, for the closing quotation mark?
			* To ensure the closing quotation mark matches the opening one. Why? Attribute values
			* can contain quotation marks, such as an apostrophe in the content.
			*/
			'content=(["\']??)(.*)\1' .

			/*
			* Allows for additional attributes after the content attribute.
			* Searches for anything other than > symbol.
			*/
			'[^>]*' .

			/*
			* \/?> searches for the closing > symbol, which can be in either /> or > format.
			* # ends the pattern.
			*/
			'\/?>#' .

			/*
			* These are the options:
			* - i : case insensitive
			* - s : allows newline characters for the . match (needed for multiline elements)
			* - U means non-greedy matching
			*/
			'isU';

	preg_match_all( $pattern, $html, $elements );

	return $elements;
}

View all references View on Trac View on GitHub

Used by	Description
WP_REST_URL_Details_Controller::parse_url_details()`wp-includes/rest-api/endpoints/class-wp-rest-url-details-controller.php`	Retrieves the contents of the title tag from the HTML response.

Changelog

Version	Description
5.9.0	Introduced.

WP_REST_URL_Details_Controller::get_meta_with_content_elements( string $html ): array

In this article

Parameters

Return

Source

Changelog

User Contributed Notes

WP_REST_URL_Details_Controller::get_meta_with_content_elements( string $html ): array

In this article

Parameters

Return

Source

Related

Changelog

User Contributed Notes