What Are Python Raw Strings?

If you’ve ever come across a standard string literal prefixed with either the lowercase letter r or the uppercase letter R, then you’ve encountered a Python raw string:

Python
      
>>> r"This is a raw string"
'This is a raw string'

Although a raw string looks and behaves mostly the same as a normal string literal, there’s an important difference in how Python interprets some of its characters, which you’ll explore in this tutorial.

Notice that there’s nothing special about the resulting string object. Whether you declare your literal value using a prefix or not, you’ll always end up with a regular Python str object.

Other prefixes available at your fingertips, which you can use and sometimes even mix together in your Python string literals, include:

b: Bytes literal
f: Formatted string literal
u: Legacy Unicode string literal (PEP 414)

Out of those, you might be most familiar with f-strings, which let you evaluate expressions inside string literals. Raw strings aren’t as popular as f-strings, but they do have their own uses that can improve your code’s readability.

Creating a string of characters is often one of the first skills that you learn when studying a new programming language. The Python Basics book and learning path cover this topic right at the beginning. With Python, you can define string literals in your source code by delimiting the text with either single quotes (') or double quotes ("):

Python
      
>>> david = 'She said "I love you" to me.'
>>> alice = "Oh, that's wonderful to hear!"

Having such a choice can help you avoid a syntax error when your text includes one of those delimiting characters (' or "). For example, if you need to represent an apostrophe in a string, then you can enclose your text in double quotes. Alternatively, you can use multiline strings to mix both types of delimiters in the text.

You may use triple quotes (''' or """) to declare a multiline string literal that can accommodate a longer piece of text, such as an excerpt from the Zen of Python:

Python
      
        
      
    
>>> poem = """
... Beautiful is better than ugly.
... Explicit is better than implicit.
... Simple is better than complex.
... Complex is better than complicated.
... """

Multiline string literals can optionally act as docstrings, a useful form of code documentation in Python. Docstrings can include bare-bones test cases known as doctests, as well.

Regardless of the delimiter type of your choice, you can always prepend a prefix to your string literal. Just make sure there’s no space between the prefix letters and the opening quote.

When you use the letter r as the prefix, you’ll turn the corresponding string literal into a raw string counterpart. So, what are Python raw strings exactly?

Free Bonus: Click here to download a cheatsheet that shows you the most useful Python escape character sequences.

Take the Quiz: Test your knowledge with our interactive “Python Raw Strings” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Raw Strings

In this quiz, you can practice your understanding of how to use raw string literals in Python. With this knowledge, you'll be able to write cleaner and more readable regular expressions, Windows file paths, and many other string literals that deal with escape character sequences.

In Short: Python Raw Strings Ignore Escape Character Sequences

In some cases, defining a string through the raw string literal will produce precisely the same result as using the standard string literal in Python:

Python
      
>>> r"I love you" == "I love you"
True

Here, both literals represent string objects that share a common value: the text I love you. Even though the first literal comes with a prefix, it has no effect on the outcome, so both strings compare as equal.

To observe the real difference between raw and standard string literals in Python, consider a different example depicting a date formatted as a string:

Python
      
>>> r"10\25\1991" == "10\25\1991"
False

This time, the comparison turns out to be false even though the two string literals look visually similar. Unlike before, the resulting string objects no longer contain the same sequence of characters. The raw string’s prefix (r) changes the meaning of special character sequences that begin with a backslash (\) inside the literal.

Note: To understand how Python interprets the above string, head over to the final section of this tutorial, where you’ll cover the most common types of escape sequences in Python.

The backslash is an escape character, which marks the start of an escape character sequence within a Python string literal. It allows you to encode non-printable characters, such as the line break, control characters like the ANSI escape codes for colors and text formatting, and foreign letters and emojis, among others.

When you print a normal string literal that includes an escape character sequence, such as backslash followed by the letter n, Python doesn’t treat these two characters literally. Instead, it interprets them as a single command and performs the corresponding action:

Python
      
>>> print("Hello\nWorld")
Hello
World

In this case, it moves to a new line after encountering the newline character sequence (\n).

On the other hand, throwing the r prefix onto that same string literal will disable the default treatment of such escape character sequences:

Python
      
>>> print(r"Hello\nWorld")
Hello\nWorld

Python prints your raw string literal without considering \n a special character sequence anymore. In other words, a raw string literal always looks exactly as it’ll be printed, while a standard string literal may not.

Raw strings are a convenient tool in your arsenal, but they’re not the only way to disable the special meaning of escape character sequences. It’s worth knowing that you can escape the backslash itself in standard string literals to suppress its peculiar behavior:

Python
      
>>> print("Hello\\nWorld")
Hello\nWorld

Here, the double backslash (\\) becomes yet another escape character sequence, which Python interprets as a literal backslash in the resulting string. Therefore, you can manage to achieve the desired outcome without using raw strings.

In fact, when you evaluate a raw string literal in the Python REPL, the interpreter automatically escapes each backslash in the shown output:

Python
      
>>> r"Hello\nWorld"
'Hello\\nWorld'

This is the canonical way of representing backslash characters in Python strings. Remember that raw strings only exist as literals in your source code. Once you evaluate them at runtime, they become regular string objects indistinguishable from other strings defined using alternative methods.

The concept of raw strings isn’t unique to Python. It addresses a common problem in programming that frequently arises when you need to include many literal backslashes in a string. For example, LaTeX markup uses backslashes generously throughout its syntax:

Python
      
    
text1 = "\\phi = \\\\ \\frac{1 + \\sqrt{5}}{2}"
text2 = r"\phi = \\ \frac{1 + \sqrt{5}}{2}"

Look how unreadable the first string literal looks compared to the raw string literal below it. With a standard string literal, you must escape each backslash by adding another backslash, which can lead to a problem known as the leaning toothpick syndrome. Raw strings simplify this by treating each backslash as a literal character instead of an escape character.

The two most common scenarios in real life where you might want to use raw strings are regular expressions and Windows file paths. You’ll take a look at the latter first, as it’s a more straightforward use case to understand.

Remove ads

How Can Raw Strings Help You Specify File Paths on Windows?

The family of Microsoft Windows operating systems, and their earlier DOS predecessor, use the backslash character (\) as the path separator symbol. The backslash signifies the boundary between a directory name and a subdirectory or file name in a path.

For example, the path C:\Users\Real Python\main.py corresponds to the following hierarchy in the Windows file system:

C:
└── Users
    └── Real Python
        └── main.py

Each line in the tree above represents an individual component of this path. The first line is the drive letter (C:). The second line is the Users folder, followed by the specific user’s subfolder and a file named main.py inside that subfolder.

Now, you can’t just write down such a path using the standard string literal because the Windows path separator would conflict with the escape character in Python. Depending on the exact escape character sequence at hand, this can merely cause Python to emit a warning or to raise a full-blown syntax error:

Python
      
        
      
    
>>> documents = "C:\Documents"
<stdin>:1: SyntaxWarning: invalid escape sequence '\D'

>>> documents
'C:\\Documents'

>>> users = "C:\Users"
  File "<stdin>", line 1
    ...
SyntaxError: (unicode error) 'unicodeescape' codec can't
⮑ decode bytes in position 2-3: truncated \UXXXXXXXX escape

Even though Python doesn’t recognize \D as a valid escape character sequence, it happily accepts it and even escapes the backslash for you. However, you shouldn’t rely on this behavior because it’ll change in a future Python release, causing an exception instead of displaying a warning message:

Changed in version 3.12: Unrecognized escape sequences produce a SyntaxWarning. In a future Python version they will be eventually a SyntaxError. (Source)

On the other hand, escape sequences that start with \U are reserved for Unicode code points that must follow a specific format, as you’ll learn later. If they don’t conform to that format, then Python will raise an exception and stop running your code.

To properly represent a Windows path as a string literal, you can either manually escape each backslash character or use a raw string literal:

Python
      
    
path1 = "C:\\Users\\Real Python\\main.py"
path2 = r"C:\Users\Real Python\main.py"

Doing so will turn off the interpolation of escape sequences that begin with a backslash.

Note that none of these methods are considered Pythonic or idiomatic to Python because they encourage you to hard-code values that may not be portable. In modern Python, you’d typically want to define your paths using the pathlib module, which takes care of translating the path separator between the major file systems:

Python
      
from pathlib import Path

path = Path.home() / "main.py"

This ensures that your code will continue working on different operating systems. Here’s what the resulting path variable will evaluate to on Windows and on a Unix-like system compliant with the POSIX standard:

Windows: WindowsPath('C:/Users/Real Python/main.py')
Unix-like: PosixPath('/home/Real Python/main.py')

When you call .open() on the corresponding path object, it’ll correctly locate the current user’s folder and open the specified file, no matter what operating system you’re on. Python will translate the forward slash (/) if necessary.

As you can see, Python offers better ways to deal with the offending path separator. In practice, you’re more likely to use raw strings when working with regular expressions, which you’ll explore now.

How Can Raw Strings Help You Write Regular Expressions?

A regular expression, or regex for short, is a formal expression written in a standard mini-language that lets you specify text patterns to search, extract, or modify. Many text editors, including Sublime Text, provide the option to find and replace text using regular expressions, enabling advanced pattern matching and manipulation capabilities.

For example, here’s a sample regex that matches the opening tags, such as <div class="dark-theme">, inside an HTML document:

Text

<\w+[^>]+>

Don’t worry if you can’t make sense of it. The bottom line is that regular expressions typically contain a number of special characters, including the dreaded backslash. As a result, they can cause problems when you want to represent them in Python string literals.

The following examples illustrate the most common use cases for regular expressions in programming:

Web scraping: Collecting all email addresses from a website
Data validation: Checking if the password is secure enough
Data anonymization: Masking sensitive information like credit card numbers
Content moderation: Removing offensive words from user comments

While you can achieve these goals using traditional programming techniques, regular expressions provide several benefits:

Declarative style
Compact and portable syntax
Unparalleled performance

A regular expression describes the what rather than the how. In other words, it represents a pattern to look for, while the underlying regex engine generates highly efficient code to handle the details. Moreover, you can describe really complex patterns that would be challenging to implement by hand. For instance, you’re able to match dynamic content by capturing and referring to parts of text within the same regular expression!

The syntax of regular expressions is a double-edged sword. As a form of a domain-specific language (DSL), it’s very efficient, but at the same time, its brevity often contributes to poor readability. What’s more, the same symbol can take different meanings depending on where in the expression you place it!

Have a look at this extreme yet syntactically correct and working email address validation regex to get an idea. It comprises a lot of special characters, making it look like a jumble of hieroglyphics or an esoteric programming language.

Note: There are two major dialects of the regular expression syntax in use today. Command-line tools like grep adhere to the POSIX-style regex syntax by default. On the other hand, many programming languages stick to the slightly more sophisticated syntax borrowed from the Perl scripting language.

While Perl’s syntax remains mostly universal, some programming languages introduce slight variations, so you may need to adjust for that when moving your regex from one language to another.

Finally, regular expressions offer excellent performance, which can be hard to beat with your custom implementation in pure Python. Still, you can achieve even better results with Python bindings for third-party libraries, such as Hyperscan by Intel.

In the context of regular expressions, using Python raw strings is considered a best practice even when you don’t necessarily need them. They absolve you from worrying about the potential conflicts between the regex syntax and Python’s escape character sequences. Raw strings let you think in terms of the regex syntax, regardless of how complicated your regular expression becomes in the future.

More specifically, raw string literals can help you avoid the following problems when you work with regular expressions:

Problem	Symbol	Escape Sequence	Regular Expression
Conflicting meaning	`\n`	Render a line break	Match the non-printable newline character
False friends	`\b`	Move the cursor back one character	Match a word boundary
Invalid syntax	`\d`	Not applicable	Match any digit character

The regular expression syntax shares a few symbols with Python’s escape character sequences. Some symbols refer to the same concept but in a different context, while others remain false friends. Other symbols have a specific meaning within regular expressions but result in an invalid Python string literal.

When you use one of these or a similar symbol in a standard string literal without escaping the backslash character, you may not be able to properly represent the expected regular expression:

Python
      
>>> import re
>>> text = "Pythonic means idiomatic in Python."
>>> re.findall("Python\b", text)
[]

In this code example, the string literal "Python\b" contains the word Python followed by the non-printable backspace character (\b), which isn’t present in the text to search through. As a result, re.findall() returns an empty list.

On the other hand, when you escape this special character sequence (\\b), it becomes the literal part of the string. The regular expression that it represents can now match the word boundary at the end of the sentence:

Python
      
>>> re.findall("Python\\b", text)
['Python']

Unfortunately, escaping becomes particularly prone to the leaning toothpick syndrome mentioned earlier when combined with regular expressions. Therefore, you’re better off using Python’s raw string literals in the first place:

Python
      
>>> re.findall(r"Python\b", text)
['Python']

This code works as expected, and your string literal looks much cleaner. Although this example may not show a spectacular improvement, using raw strings becomes more important as your regular expressions get more complicated.

Note: Remember that raw strings can only help with string literals defined in Python source code. If you load your regular expression from a file or elsewhere, then you don’t need to take any extra steps because the resulting string object will already be in the right format.

At this point, you have a pretty good idea about the benefits that raw string literals bring to Python. However, that isn’t to say they’re without their own set of challenges. In the next section, you’ll learn when to be careful about using them.

Remove ads

What Should You Watch Out for When Using Raw Strings?

A single raw string literal may have alternative visual representations on the screen depending on how you treat it, which can be confusing at times. For example, when you print such a literal, the result looks straightforward:

Python
      
>>> print(r"\\")
\\

The text that appears in the output corresponds to the literal value enclosed in the double quotes, even when it contains the backslash character. That’s the main idea behind raw string literals, after all.

However, when you work in the interactive Python shell, also known as the Python REPL, you have the option of previewing the visual representation of expressions, such as string literals, without printing them:

Python
      
>>> r"\\"
'\\\\'

This is known as the evaluation of expressions. Evaluating a string literal results in creating a new instance of the Python str data type. When displaying the evaluated string, Python shows you the object’s internal representation. In this case, the string object represents each literal backslash with two backslashes, so you see four instead of two.

That’s the canonical representation of string objects in Python, which you can copy and paste into your source code. This representation is equivalent to your earlier raw string literal:

Python
      
>>> r"\\" == "\\\\"
True

>>> print("\\\\")
\\

>>> len("\\\\")
2

As you can see, raw and standard string literals offer alternative ways of encoding the same value. Despite the four backslashes in the standard string literal, the underlying string object stores only two characters in memory.

Note: Things get even more complicated when you call the built-in repr() function on your raw string literal to obtain its printable representation:

Python
      
>>> repr(r"\\")
"'\\\\\\\\'"

>>> print(repr(r"\\"))
'\\\\'

This might be useful for debugging purposes, as it gets you a string that accurately encodes your original string literal, including the quotes around it. During development, you could temporarily substitute expensive computations in your source code with the snapshot of a particular string object obtained this way.

Another challenge that might take you by surprise is the presence of trailing backslashes in your string literals. Even though Python raw strings allow you to use literal backslashes, there’s one exception to this rule:

Python
      
        
      
    
>>> r"\"
  File "<stdin>", line 1
    ...
SyntaxError: unterminated string literal (detected at line 1)

>>> r"\\"
'\\\\'

>>> r"\\\"
  File "<stdin>", line 1
    ...
SyntaxError: unterminated string literal (detected at line 1)

>>> r"\\\\"
'\\\\\\\\'

>>> r"\\\\\"
  File "<stdin>", line 1
    ...
SyntaxError: unterminated string literal (detected at line 1)

Whether you use standard or raw string literals, they can’t end with an odd number of consecutive backslash characters because that would result in a syntax error. Such a string literal gets interpreted as unterminated due to an unclosed quotation mark.

Note this has nothing to do with escaping the quote, although it has a similar effect. This unexpected behavior is an artifact of the Python code parser, which treats the backslash character literally in raw strings as long as there’s something to follow. When you place the backslash at the end of a string literal, the parser gets confused, expecting at least one more character to the right.

So, if you place the same sequence elsewhere in your raw string literal, then it’ll appear in literal form:

Python
      
>>> print(r"The sequence \" is treated literally")
The sequence \" is treated literally

The number of consecutive backslash characters must be even only at the end of the string literal. You can use an odd number of consecutive backslash characters anywhere else in the string:

Python
      
>>> print(r"\\\w+")
\\\w+

In this case, the three backslash characters are followed by ordinary letters and symbols, so the string literal doesn’t end with an odd number of backslashes.

This particular edge case can affect raw string literals representing directory paths on Windows that end with a single trailing backslash:

Python
      
>>> r"C:\Users\Real Python\"
  File "<stdin>", line 1
    ...
SyntaxError: unterminated string literal (detected at line 1)

You could work around it using a dirty hack, for example, by appending a space to the string and stripping it away:

Python
      
>>> r"C:\Users\Real Python\ ".rstrip()
'C:\\Users\\Real Python\\'

However, using the pathlib module instead of strings to deal with file paths is usually a better choice.

While this limitation is common to raw as well as standard string literals, the challenge of nesting quotation marks within a literal is unique to raw strings only. With standard string literals, you can always escape the single or double quote to avoid a conflict with the enclosing string delimiter:

Python
      
>>> "She said \"I love you\" to me."
'She said "I love you" to me.'

>>> 'Oh, that\'s wonderful to hear!'
"Oh, that's wonderful to hear!"

Here, the sequences \" and \' allow the quotes to become part of the string without causing a syntax error. Notice how Python automatically flips the enclosing quotes to simplify the canonical string representation when showing the evaluated objects.

In contrast, using identical sequences in a raw string literal will escape the backslashes, causing them to show up in the output:

Python
      
>>> r"She said \"I love you\" to me."
'She said \\"I love you\\" to me.'

>>> print(r"She said \"I love you\" to me.")
She said \"I love you\" to me.

However, this is less of a problem because you can always put your text between triple quotes despite creating a single-line string literal:

Python
      
>>> r"""She said "I love you" to me."""
'She said "I love you" to me.'

Naturally, you could replace the triple quotation mark (""") with the triple apostrophe (''').

What could be a more annoying problem is the lack of ability to escape Unicode characters in raw string literals. In particular, you can’t use Unicode literals or Unicode placeholders in raw strings because those don’t process escape sequences:

Python
      
        
      
    
>>> print("\u00e9", r"\u00e9")
é \u00e9

>>> print(
...     "\N{latin small letter e with acute}",
...     r"\N{latin small letter e with acute}",
... )
é \N{latin small letter e with acute}

Escape sequences starting with \u and \U let you represent foreign letters and symbols using their numeric Unicode code points, while a sequence that begins with \N allows you to refer to those letters and symbols by name. But you can’t use these encoding techniques in raw string literals because they treat the backslash character literally.

There’s no good way to mitigate this problem. If you really need to encode Unicode characters using one of these escape sequences, then you can concatenate your raw string literal with a standard string literal, like so:

Python
      
>>> print(r"C:\caf" + "\u00e9.txt")
C:\café.txt

Although not pretty, it does the trick. This is somewhat similar to the work-around for the trailing backslash character in directory paths that you saw earlier.

Okay, now that you know when and how to use a raw string literal in Python, you may be wondering if its sister feature, the raw bytes literal, has any purpose. You’ll discover more on this in the following section.

Remove ads

When Should You Choose Raw Bytes Over Raw String Literals?

Apart from defining raw string literals in Python, you can specify equivalent raw bytes literals using the rb or br prefix—or their uppercase counterparts. To understand what they’re good for, it helps to revisit or familiarize yourself with the regular bytes object first.

A bytes instance looks and behaves much like a string, but it represents a sequence of numeric bytes instead of characters. You can define a bytes literal by prefixing your ordinary string literal with the letter b. The only reservation is that you’re limited to using ASCII characters within your bytes literal. To encode non-ASCII characters, you typically use relevant escape character sequences.

For example, here’s the word café encoded as UTF-8 bytes:

Python
      
>>> "café".encode("utf-8")
b'caf\xc3\xa9'

>>> list(b"caf\xc3\xa9")
[99, 97, 102, 195, 169]

You can preview the individual byte values by passing your bytes-like object into the list() constructor. Because the letter é doesn’t have an ASCII representation, it requires two bytes in the UTF-8 character encoding. You must escape these two bytes using their ordinal values, most commonly in the hexadecimal system.

The need for using such escape character sequences seemingly defeats the purpose of raw bytes literals. Python would treat the backslash character literally, preventing you from inserting the necessary escape sequences into the bytes literal. However, raw bytes literals can occasionally become useful when you’re dealing with binary data that mostly consists of ASCII letters.

For instance, the requests package can provide the body of an HTTP message as bytes rather than a string. Should you want to search through such undecoded content of a website using regular expressions, defining your patterns with raw bytes literals almost becomes a necessity:

Python
      
        
      
    
>>> import re
>>> import requests
>>> response = requests.get("https://realpython.com/")
>>> re.findall(rb"<(\w+)\b[^>]+>", response.content)
[b'html', b'link', b'meta', ..., b'script']

The combination of r and b prefixes in front of the regex pattern creates a bytes literal with the special treatment of escape character sequences disabled.

On the other hand, using a non-raw bytes literal—prefixed with just b—would require you to manually escape some of the regex symbols, compromising on readability. Otherwise, you’d get warnings, or worse, your regex might not work as intended:

Python
      
>>> re.findall(b"<(\\w+)\\b[^>]+>", response.content)
[b'html', b'link', b'meta', ..., b'script']

>>> re.findall(b"<(\w+)\b[^>]+>", response.content)
<stdin>:1: SyntaxWarning: invalid escape sequence '\w'
[]

The first bytes literal works correctly but doesn’t look as neat as its raw counterpart, while the second one finds no matches at all and produces a warning message.

What about using plain-old raw string literals? As it turns out, you can’t mix string and bytes objects in Python:

Python
      
        
      
    
>>> re.findall(r"<(\w+)\b[^>]+>", response.content)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    ...
TypeError: cannot use a string pattern on a bytes-like object

Although this raw string literal consists of exactly the same ASCII characters as the raw bytes literal that you saw previously, Python treats them differently.

Note: The requests package can return the body of an HTTP message as a Python string instead of a byte sequence. To get the website’s content as a sequence of characters, you may access the response object’s .text attribute:

Python
      
>>> response.text
'\n\n<!doctype html>\n<html lang="en"> (...) </html>\n'

While more convenient, this method can sometimes result in a malformed string because it relies on the metadata sent by the server to decode the content. If a misconfigured web server sent an incorrect character encoding, then the library would be left guessing.

Another area where raw bytes literals can be desirable is unit testing, which often involves comparing the expected and actual values:

Python
      
>>> "café".encode("unicode_escape")
b'caf\\xe9'

>>> "café".encode("unicode_escape") == rb"caf\xe9"
True

Here, you encode the string café using the unicode_escape codec, which produces a bytes object with a Unicode literal for the accented letter. You then take advantage of a raw bytes literal to compare the actual and expected values without escaping the backslash yourself, which would be necessary if you used a regular bytes literal instead.

That wraps up all you need to know about raw string—and raw bytes—literals in Python. As a bonus, make sure to check out some of the most common escape character sequences below, which you may bump into during your coding journey.

Remove ads

What Are the Common Escape Character Sequences?

The escape sequences in Python are modeled after those supported by standard C, which means they mostly overlap. Therefore, apart from escape sequences for the typical non-printable characters, such as newline (\n) and tabulation (\t), Python lets you use less common ones like the null character (\0), which is often associated with null-terminated strings in C.

Perhaps one of the most unusual escape sequences you can include in your string literals is \a, which represents the bell character. Back in the day, computer terminals had a physical bell that would ring in response to receiving such a control character. Today, when used on some terminal emulators, this sequence triggers an audible alert or sound:

Python
      
>>> print("Ding!\a")

Go ahead and try it now to see if your terminal supports the bell character!

Several escape sequences in Python allow you to represent ordinal values of ASCII characters using either the hexadecimal or octal numeral system. For example, the ordinal value of the asterisk symbol (*) is 42 in decimal, which is equal to 2a₁₆ in hexadecimal and 52₈ in octal:

Python
      
>>> ord("*")
42

>>> hex(42), oct(42)
('0x2a', '0o52')

Here, you call the built-in ord() function to find the ordinal value of a character in Python. The hex() and oct() functions let you convert this decimal integer into strings with the corresponding hexadecimal and octal representations, respectively.

Note that you must format such strings slightly differently in your string literal to turn them into escape sequences:

Python
      
>>> "Hexadecimal: \x2a"
'Hexadecimal: *'

>>> "Octal: \052"
'Octal: *'

The escape sequence of a character’s ordinal value expressed in the hexadecimal system must start with a backslash character followed by the lowercase letter x and exactly two hexadecimal digits (\xhh).

On the other hand, octal literals can have between one and three octal digits (\ooo). You don’t have to pad octal escape sequences with leading zeros, though, when the character’s ordinal value isn’t big enough. The earlier date example ("10\25\1991") took advantage of it.

Encoding ordinal values of ASCII characters on string literals could be helpful if a character was missing from your keyboard and there was no equivalent shorthand like \a. Specifically, this allows you to include non-printable control characters from the extended ASCII set:

Python
      
>>> print("Top-Left\x84Bottom-Right")
Top-Left
        Bottom-Right

Bear in mind that your exact set of the extended characters may vary depending on the current code page, which defines one of many supersets of the original 7-bit ASCII character table.

However, you’re more likely to encounter hexadecimal escape codes like these in a bytes literal. They encode non-ASCII byte values, which often come in contiguous groups that have a specific meaning together:

Python
      
>>> b"caf\xc3\xa9".decode("utf-8")
'café'

>>> "caf\xc3\xa9"
'cafÃ©'

The two escape sequences, \xc3 and \xa9, correspond to bytes with decimal values of 195 and 169, which together form the UTF-8 encoding for the accented letter é. When you decode this bytes literal into a string, Python replaces such combinations of bytes with an appropriate Unicode character. On the other hand, placing these same escape sequences in a string literal makes Python interpret them individually as separate ASCII characters.

Fortunately, you can escape Unicode characters in string literals directly using another escape sequence format:

Python
      
>>> ord("é")
233

>>> hex(233)
'0xe9'

>>> "caf\u00e9"
'café'

The \uhhhh format consists of precisely four hexadecimal digits and is applicable to 16-bit Unicode characters whose code points are no greater than about sixty-five thousand. This covers the Basic Multilingual Plane (BMP), which includes letters in the majority of modern alphabets.

To encode 32-bit Unicode characters, such as the snake emoji, you’ll need to use the \Uhhhhhhhh format, comprising exactly eight hexadecimal digits:

Python
      
>>> "\U0001f40d"
'🐍'

Notice that the letter \U must now be in uppercase! This prevents the escape sequence from being incorrectly interpreted as the four-digit counterpart.

If you don’t find these Unicode escape sequences convenient to work with, then you’ll appreciate yet another format. It allows you to use a Unicode name alias instead of the numeric code point to refer to a character:

Python
      
>>> "\N{snake}"
'🐍'

The use of the uppercase letter \N makes this escape character sequence distinct from the newline character (\n). You can find the official Unicode name of a given character using the unicodedata module from the standard library, like so:

Python
      
        
      
    
>>> import unicodedata
>>> unicodedata.lookup("snake")
'🐍'
>>> unicodedata.name("🐍")
'SNAKE'

The lookup() function expects a string with the character’s name and returns the corresponding Unicode character, while the name() function takes a character and maps it to a suitable name alias. Note that while lookup() and \N{} are case-insensitive, name() always returns the character’s name in uppercase.

Lastly, a pretty common use case for escape sequences that you might encounter in Python is ANSI escape codes, which control the formatting and display of text in your terminal. For example, the following string literal contains cryptic codes that will make the word really appear in red and underlined on terminals that support such markup:

Python
      
>>> print("This is \033[31;1;4mreally\033[0m important.")
This is really important.

Do you see how the escape codes disappear from the output? It’s because this terminal supports ANSI escape codes. Otherwise, some of these characters would appear in literal form. This sometimes happens when you log in to a remote server using a client with no support for ANSI codes or when you redirect the output to a file.

Note: Check out the corresponding section in the tutorial on the print() function to learn more about using ANSI escape codes in Python.

It’s worth knowing these common escape character sequences, as you might unwittingly try to use them in your string literals without realizing they have a special meaning in Python. Now you know to watch out for them!

Remove ads

Conclusion

In this tutorial, you delved into defining raw string literals in your Python source code. You’re now able to write cleaner and more readable regular expressions, Windows file paths, and many other string literals that deal with escape character sequences.

Along the way, you’ve learned about the most common escape character sequences, their use cases, and potential problems that may arise when using raw strings. You know how Python interprets those sequences depending on which type of string literal you choose. Finally, you compared raw string literals to their raw binary counterparts.

Free Bonus: Click here to download a cheatsheet that shows you the most useful Python escape character sequences.

Take the Quiz: Test your knowledge with our interactive “Python Raw Strings” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Raw Strings

What Do You Think?

Rate this article:

LinkedIn Twitter Facebook Email

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.

Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!