What Is %20 in HTML?by Annik Stahl
If you're new to HTML -- HyperText Markup Language, the language of the Internet -- you're bound to run across certain pieces of code that may stump you. The "%20" in Internet addresses (or URLs) is one that can be demystified pretty easily: it simply stands for a blank space.
A (Very) Brief History of HTML
In 1989, at a most unlikely spot -- CERN, the European Laboratory for Particle Physics -- a young physicist named Tim Berners-Lee had an idea. He wanted to create a place where researchers and scientists from all over the world could pool information, collaborate, and even link to and from one another’s resources. This was the germ of an idea that became the Internet. Creating the language to enable and implement these global “hyper-texted” links and to define the structure and layout of this new invention took a few more years. In 1993, the first version of HTML was published.
Browsers Read Character Sets
Today, just about everything you see on a Web page is rendered because the browser you’re using knows what “character set” or “character encoding” your HTML page requires. Unicode, while the most modern character set (it covers the widest range of characters and languages) can’t be used in URLs since most browsers can read URLs only in ASCII, the most basic of all character encodings. ASCII consists of English letters, Latin numbers and some symbols.
ASCII Was Here First
What if you have something that you want in your URL that can’t be symbolized by ASCII? Your URL is automatically converted so that it's valid. An example of an “unsafe” non-ASCII character in a URL would be a dollar sign -- $. For instance, if you formatted your URL to be http://www.WeCanSaveYou$$$, this URL would be converted to http://www.WeCanSaveYou%24%24%24.
Even Spaces Count
Another type of space that a browser won’t recognize is a non-breaking space. In an HTML document, a non-breaking space (a space that prevents an automatic line break) is created by the character entity " " (without the quotes). Website designers use non-breaking spaces to retain white space on a website, because browsers always delete multiple spaces, whether they appear between words, lines, paragraphs, or any other page element, retaining just a single space in each case. In URLs, the entity can’t be recognized and so is replaced with the ASCII code %20.
- link 456 Berea Street: Be careful with non-ascii characters in URLs
- link W3 Schools: HTML Encoding (Character Sets)
- link HTML codes: Characters and Symbols (ASCII Dec and Hex; HTML)
- link World Wide Web Foundation: History of the Web
- link W3C: Chapter 2 - A History of HTML
- link Stack Overflow: Unicode Characters in URLs
- link W3C: HTML Entities
- link Sight Specific: What is ? Is it needed?
- photo_camera Alex Slobodkin/iStock/Getty Images