Google Amp: Replacing Embedded Content
The struggle continues with making the site Google Amp-compliant. Today, I fix another issue on my site: embedded content.
Google's new initiative called Google AMP (Accelerated Mobile Pages) focuses on stripping out extra fluff around your web pages to create mobile versions. Each AMP page should load 2x-4x faster than normal.
Since I started this particular mission to convert my content pages over to AMP (here, here, and here), I've been watching my site through the Google Search Console (Search Appearance -> Accelerated Mobile Pages) to see if everything is proceeding as planned.
The errors started growing back up to where I am experiencing 301 errors on my site.
One of the errors Google made me aware of was my embedded content like YouTube and Google Trends.
The pages in question were:
- How to Successfully Mock HttpContext (Embedded Content: Google Trends)
- Collection: Real-World Refactoring (Embedded Content: YouTube.com)
Of course, these two issues are a piece of a significant amount of work most webmasters have to endure to become Google Amp-compliant. They need to fix thousands of web pages containing embedded content.
I only have a total of ~400 with maybe a small number of pages with embedded content, but these two are great to focus on for now until I can find more through Google's Search Console.
Both of these web pages above have different embedded content. The Google Trends content is a script while the YouTube content is an iframe (eek!).
Let's continue to modify our UseAmpImage attribute and rename it.
AmpPage Attribute
Since we aren't only modifying images, I renamed the attribute from UseAmpImage to AmpPage. So now it will look like this on our controller:
[AmpPage] public class AmpController: Controller
Now for our issue with embedded content.
Perform The Ole Switcheroo
Since this embedded content could be imperative to get a point across, we need a way to show that content without removing. If we remove it, there is a possibility that our page won't deliver the point or actually deliver the content they were expecting on the web page.
For our AMP pages, I decided I want to replace the embedded content with a link to the actual content.
Yet, how do we grab the URL from the embedded content?
We automatically can't do that, but we can add an attribute to the embedded content element to point them in the right direction. The attribute will use the HTML 5 syntax of "data-" in front of it like this: <script data-link='https://www.youtube.com/watch?v=u1Ds9CeG-VY" ...></script>
.
For example, YouTube uses IFrames. I need to place a link telling my ActionFilter to replace the iFrame with an AMP-compliant, mobile-readable link. For the Google Trends content, I place that link in the script tag as an attribute.
So let's get started.
Previously on the OnResultExecuted
In the previous Google Amp posts, we had an OnResultExecuted event that would transform our images into Google Amp images.
public override void OnResultExecuted(ResultExecutedContext filterContext) { var response = _stringBuilder.ToString();
// Change images to Amp-specific images response = UpdateAmpImages(response);
_output.Write(response); }
All we need is another method to replace embedded content with a link. Here is the new method:
private string ReplaceWithLink(string tag, string response) { var doc = GetHtmlDocument(response); var elements = doc.DocumentNode.Descendants(tag); foreach (var htmlNode in elements) { if (htmlNode.Attributes["data-link"] == null) continue;
var dataLink = htmlNode.Attributes["data-link"].Value; var paragraph = doc.CreateElement("p");
var text = String.Format("[Embedded Link] {0}", dataLink);
var anchor = doc.CreateElement("a"); anchor.InnerHtml = text; anchor.Attributes.Add("href", dataLink); anchor.Attributes.Add("title", text); paragraph.InnerHtml = anchor.OuterHtml;
var original = htmlNode.OuterHtml; var replacement = paragraph.OuterHtml;
response = response.Replace(original, replacement); }
return response; }
The first two lines should look familiar, but after that, the code focuses on replacing our embedded content with a link to the content.
The tag passed in could be "iframe" or "script", but it will look for them in the document.
The concept is to find out if this tag has a "data-link" attribute attached. Once we know there is a data link attribute, we can grab that and create our replacement content.
We even make the user aware that the content is embedded by adding "[Embedded Content]" to the link title and text.
Once that's finished, we perform the replace.
The Finished Product
Here is what our finished AmpPage attribute looks like:
ActionFilter\AmpPageAttribute.cs
public class AmpPageAttribute : ActionFilterAttribute { private HtmlTextWriter _htmlTextWriter; private StringWriter _stringWriter; private StringBuilder _stringBuilder; private HttpWriter _output;
public override void OnActionExecuting(ActionExecutingContext filterContext) { _stringBuilder = new StringBuilder(); _stringWriter = new StringWriter(_stringBuilder); _htmlTextWriter = new HtmlTextWriter(_stringWriter); _output = (HttpWriter)filterContext.RequestContext.HttpContext.Response.Output; filterContext.RequestContext.HttpContext.Response.Output = _htmlTextWriter; }
public override void OnResultExecuted(ResultExecutedContext filterContext) { var response = _stringBuilder.ToString();
// Change images to Amp-specific images response = UpdateAmpImages(response);
// For AMP pages, change Script content to a link. response = ReplaceWithLink("script", response);
// For AMP pages, change iFrame content to a link. response = ReplaceWithLink("iframe", response);
_output.Write(response); }
private string ReplaceWithLink(string tag, string response) { var doc = GetHtmlDocument(response); var elements = doc.DocumentNode.Descendants(tag); foreach (var htmlNode in elements) { if (htmlNode.Attributes["data-link"] == null) continue;
var dataLink = htmlNode.Attributes["data-link"].Value; var paragraph = doc.CreateElement("p");
var text = String.Format("[Embedded Link] {0}", dataLink);
var anchor = doc.CreateElement("a"); anchor.InnerHtml = text; anchor.Attributes.Add("href", dataLink); anchor.Attributes.Add("title", text); paragraph.InnerHtml = anchor.OuterHtml;
var original = htmlNode.OuterHtml; var replacement = paragraph.OuterHtml;
response = response.Replace(original, replacement); }
return response; }
private string UpdateAmpImages(string response) { // Use HtmlAgilityPack (install-package HtmlAgilityPack) var doc = GetHtmlDocument(response); var imageList = doc.DocumentNode.Descendants("img");
const string ampImage = "amp-img";
if (!imageList.Any()) return response;
if (!HtmlNode.ElementsFlags.ContainsKey("amp-img")) { HtmlNode.ElementsFlags.Add("amp-img", HtmlElementFlag.Closed); }
foreach (var imgTag in imageList) { var original = imgTag.OuterHtml; var replacement = imgTag.Clone(); replacement.Name = ampImage; replacement.Attributes.Remove("caption"); response = response.Replace(original, replacement.OuterHtml); }
return response; }
private HtmlDocument GetHtmlDocument(string htmlContent) { var doc = new HtmlDocument { OptionOutputAsXml = true, OptionDefaultStreamEncoding = Encoding.UTF8 }; doc.LoadHtml(htmlContent);
return doc; } }
If we look at the Collection: Real-World Refactoring AMP page, it looks like this:
But once we add the data-link
attribute to the IFrame YouTube video, our result AMP page looks like this:
This SHOULD make our embedded content Amp-compliant with Google.
Conclusion
As we continue down this Google AMP path, I am experiencing more and more challenges with my site. Luckily enough, this speaks volumes to the software developers at Microsoft for making MVC extremely flexible to accommodate these kinds of changes in our HTML.
Also, for those observant enough, you'll notice that in the UpdateAmpImages method, I included a line that removed the caption from images. I didn't realize this, but when adding an image to my CMS, it automatically adds a caption attribute to it. This line removes it.
As you can see, using the HtmlAgilityPack to parse HTML, we can perform any kind of filtering we need on our HTML to the browser.
I'm sure I won't be done anytime soon, but I will keep my eyes on the Search Console and prepare additional content for those looking to make their web pages Google Amp-compliant.
For those converting your web pages into Google Amp pages, did Google make you aware of something you missed in your web pages? Post your comments below to discuss. I'm interested (Or is everyone waiting until I'm done to use the code?) ;-)