Python's strings have 47 methods. That's almost as many string methods as there are built-in functions in Python! Which string methods should you learn first?
There are about a dozen string methods that are extremely useful and worth committing to memory. Let's take a look at the most useful string methods and then briefly discuss the remaining methods and why they're less useful.
Here are the dozen-ish Python string methods I recommend committing to memory.
Method | Related Methods | Description |
---|---|---|
join |
Join iterable of strings by a separator | |
split |
rsplit |
Split (on whitespace by default) into list of strings |
replace |
Replace all copies of one substring with another | |
strip |
rstrip & lstrip |
Remove whitespace from the beginning and end |
casefold |
lower & upper |
Return a case-normalized version of the string |
startswith |
Check if string starts with 1 or more other strings | |
endswith |
Check if string ends with 1 or more other strings | |
splitlines |
Split into a list of lines | |
format |
Format the string (consider an f-string before this) | |
count |
Count how many times a given substring occurs | |
removeprefix |
Remove the given prefix | |
removesuffix |
Remove the given suffix |
You might be wondering "wait why is my favorite method not in that list?" I'll briefly explain the rest of the methods and my thoughts on them below. But first, let's look at each of the above methods.
If you need to convert a list to a string in Python, the string join
method is what you're looking for.
>>> colors = ["purple", "blue", "green", "orange"]
>>> joined_colors = ", ".join(colors)
>>> joined_colors
'purple, blue, green, orange'
The join
method can concatenate a list of strings into a single string, but it will accept any other iterable of strings as well.
>>> digits = range(10)
>>> digit_string = "".join(str(n) for n in digits)
>>> digit_string
'0123456789'
If you need to break a string into smaller strings based on a separator, you need the string split
method.
>>> time = "1:19:48"
>>> parts = time.split(":")
>>> parts
['1', '19', '48']
Your separator can be any substring.
We're splitting by a :
above, but we could also split by ->
:
>>> graph = "A->B->C->D"
>>> graph.split("->")
('A', 'B', 'C', 'D')
You usually wouldn't want to call split
with a space character:
>>> langston = "Does it dry up\nlike a raisin in the sun?\n"
>>> langston.split(" ")
['Does', 'it', 'dry', 'up\nlike', 'a', 'raisin', 'in', 'the', 'sun?\n']
Splitting on the space character works, but often when splitting on spaces it's actually more useful to split on all whitespace.
Calling split
method no arguments will split on any consecutive whitespace characters:
>>> langston = "Does it dry up\nlike a raisin in the sun?\n"
>>> langston.split()
['Does', 'it', 'dry', 'up', 'like', 'a', 'raisin', 'in', 'the', 'sun?']
Note that split
without any arguments also removes leading and trailing whitespace.
There's one more split
feature that folks sometimes overlook: the maxsplit
argument.
When calling split
with a maxsplit
value, Python will split the string up that number of times.
This is handy when you only care about the first one or two occurrences of a separator in a string:
>>> line = "Rubber duck|5|10"
>>> item_name, the_rest = line.split("|", maxsplit=1)
>>> item_name
'Rubber duck'
If it's the last couple occurrences of a separator that you care about, you'll want to use the string rsplit
method instead:
>>> the_rest, amount = line.rsplit("|", maxsplit=1)
>>> amount
'10'
With the exception of calling split
method without any arguments, there's no way to ignore repeated separators or trailing/leading separators or to supports multiple separators at once.
If you need any of those features, you'll want to look into regular expressions (specifically the re.split
function).
Need to replace one substring (a string within a string) with another?
That's what the string replace
method is for!
>>> message = "JavaScript is lovely"
>>> message.replace("JavaScript", "Python")
'Python is lovely'
The replace
method can also be used for removing substrings, by replacing them with an empty string:
>>> message = "Python is lovely!!!!"
>>> message.replace("!", "")
'Python is lovely'
There's also an optional count
argument, in case you only want to replace the first N
occurrences:
>>> message = "Python is lovely!!!!"
>>> message.replace("!", "?", 2)
'Python is lovely??!!'
The strip
method is for removing whitespace from the beginning and end of a string:
>>> text = """
... Hello!
... This is a multi-line string.
... """
>>> text
'\nHello!\nThis is a multi-line string.\n'
>>> stripped_text = text.strip()
>>> stripped_text
'Hello!\nThis is a multi-line string.'
If you just need to remove whitespace from the end of the string (but not the beginning), you can use the rstrip
method:
>>> line = " Indented line with trailing spaces \n"
>>> line.rstrip()
' Indented line with trailing spaces'
And if you need to strip whitespace from just the beginning, you can use the lstrip
method:
>>> line = " Indented line with trailing spaces \n"
>>> line.lstrip()
'Indented line with trailing spaces \n'
Note that by default strip
, lstrip
, and rstrip
remove all whitespace characters (space, tab, newline, etc.).
You can also specify a specific character to remove instead.
Here we're removing any trailing newline characters but leaving other whitespace intact:
>>> line = "Line 1\n"
>>> line
'Line 1\n'
>>> line.rstrip("\n")
'Line 1'
Note that strip
, lstrip
, and rstrip
will also accept a string of multiple characters to strip.
>>> words = ['I', 'enjoy', 'Python!', 'Do', 'you?', 'I', 'hope', 'so.']
>>> [w.strip(".!?") for w in words]
['I', 'enjoy', 'Python', 'Do', 'you', 'I', 'hope', 'so']
Passing multiple characters will strip all of those characters, but they'll be treated as individual characters (not as a substring).
If you need to strip a multi-character substring instead of individual characters, see removesuffix
and removeprefix
below.
Need to uppercase a string?
There's an upper
method for that:
>>> name = "Trey"
>>> name.upper()
'TREY'
Need to lowercase a string?
There's a lower
method for that:
>>> name = "Trey"
>>> name.lower()
'trey'
What if you're trying to do a case-insensitive comparison between strings?
You could lowercase or uppercase all of your strings for the comparison.
Or you could use the string casefold
method:
>>> name = "Trey"
>>> "t" in name
False
>>> "t" in name.casefold()
True
But wait, isn't casefold
just the same thing as lower
?
>>> name = "Trey"
>>> name.casefold()
'trey'
Almost.
If you're working with ASCII characters, casefold
does exactly the same thing as the string lower
method.
But if you have non-ASCII characters (see Unicode character encodings in Python), there are some characters that casefold
handles uniquely.
There are a few hundred characters that normalize differently between the lower
and casefold
methods.
If you're working with text using the International Phonetic alphabet or text written in Greek, Cyrillic, Armenian, Cherokee, and large handful of other languages you should probably use casefold
instead of lower
.
Do keep in mind that casefold
doesn't solve all text normalization issues though.
It's possible to represent the same data in multiple ways in Python, so you'll need to look into Unicode data normalization and Python's unicodedata
module if you think you'll be comparing non-ASCII text often.
The string startswith
method can check whether one string is a prefix of another string:
>>> property_id = "UA-1234567"
>>> property_id.startswith("UA-")
True
The alternative to startswith
is to slice the bigger string and do an equality check:
>>> property_id = "UA-1234567"
>>> prefix = "UA-"
>>> property_id[:len(prefix)] == prefix
True
That works, but it's awkward.
You can also quickly check whether one strings starts with many different substrings by passing a tuple
of substrings to startswith
.
Here we're checking whether each string in a list starts with a vowel to determine whether the article "an" or "a" should be used:
>>> names = ["Go", "Elixir", "OCaml", "Rust"]
>>> for name in names:
... if name.startswith(("A", "E", "I", "O", "U")):
... print(f"An {name} program")
... else:
... print(f"A {name} program")
...
A Go program
An Elixir program
An OCaml program
A Rust program
Note that startswith
returns True
if the string starts with any of the given substrings.
Many long-time Python programmers often overlook the fact that startswith
will accept either a single string or a tuple of strings.
The endswith
method can check whether one string is a suffix of another string.
The string endswith
method works pretty much like the startswith
method.
It works with a single string:
>>> filename = "3c9a9fd05f404aefa92817650be58036.min.js"
>>> filename.endswith(".min.js")
True
But it also accepts a tuple of strings:
>>> filename = "3c9a9fd05f404aefa92817650be58036.min.js"
>>> filename.endswith((".min.js", ".min.css"))
True
Just as with startswith
, when endswith
is given a tuple, it returns True
if our string ends with any of the strings in that tuple.
The splitlines
method is specifically for splitting up strings into lines.
>>> text = "I'm Nobody! Who are you?\nAre you – Nobody – too?"
>>> text.splitlines()
["I'm Nobody! Who are you?", 'Are you – Nobody – too?']
Why make a separate method just for splitting into lines?
Couldn't we just use the split
method with \n
instead?
>>> text.split("\n")
["I'm Nobody! Who are you?", 'Are you – Nobody – too?']
While that does work in some cases, sometimes newlines are represented by \r\n
or simply \r
instead of \n
.
If you don't know exactly what line endings your text uses, splitlines
can be handy.
>>> text = "Maybe it just sags\r\nlike a heavy load.\r\nOr does it explode?"
>>> text.split("\n")
['Maybe it just sags\r', 'like a heavy load.\r', 'Or does it explode?']
>>> text.splitlines()
['Maybe it just sags', 'like a heavy load.', 'Or does it explode?']
But there's an even more useful reason to use splitlines
: it's quite common for text to end in a trailing newline character.
>>> zen = "Flat is better than nested.\nSparse is better than dense.\n"
The splitlines
method will remove a trailing newline if it finds one, whereas the split
method will split on that trailing newline which would give us an empty line at the end (likely not what we actually want when splitting on lines).
>>> zen.split("\n")
['Flat is better than nested.', 'Sparse is better than dense.', '']
>>> zen.splitlines()
['Flat is better than nested.', 'Sparse is better than dense.']
Unlike split
, the splitlines
method can also split lines while maintaning the existing line endings by specifying keepends=True
:
>>> zen.splitlines(keepends=True)
['Flat is better than nested.\n', 'Sparse is better than dense.\n']
When splitting strings into lines in Python, I recommend reaching for splitlines
instead of split
.
Python's format
method is used for string formatting (a.k.a. string interpolation).
>>> version_message = "Version {version} or higher required."
>>> print(version_message.format(version="3.10"))
Version 3.10 or higher required
Python's f-strings were an evolution of the format
method.
>>> name = "Trey"
>>> print(f"Hello {name}! Welcome to Python.")
Hello Trey! Welcome to Python.
You might think that the format
method doesn't have much use now that f-strings have long been part of Python.
But the format
method is handy for cases where you'd like to define your template string in one part of your code and use that template string in another part.
For example we might define a string-to-be-formatted at the top of a module and then use that string later on in our module:
BASE_URL = "https://api.stackexchange.com/2.3/questions/{ids}?site={site}"
# More code here
question_ids = ["33809864", "2759323", "9321955"]
url_for_questions = BASE_URL.format(
site="stackoverflow",
ids=";".join(question_ids),
)
We've predefined our BASE_URL
template string and then later used it to construct a valid URL with the format
method.
The string count
method accepts a substring and returns the number of times that substring occurs within our string:
>>> time = "3:32"
>>> time.count(":")
1
>>> time = "2:17:48"
>>> time.count(":")
2
That's it. The count
method is pretty simple.
Note that if you don't care about the actual number but instead care whether the count is greater than 0
:
has_underscores = text.count("_") > 0
You don't need the count
method.
Why?
Because Python's in
operator is a better way to check whether a string contains a substring:
has_underscores = "_" in text
This has the added benefit that the in
operator will stop as soon as it finds a match, whereas count
always needs to iterate through the entire string.
The removeprefix
method will remove an optional prefix from the beginning of a string.
>>> hex_string = "0xfe34"
>>> hex_string.removeprefix("0x")
'fe34'
>>> hex_string = "ac6b"
>>> hex_string.removeprefix("0x")
'ac6b'
The removeprefix
method was added in Python 3.9.
Before removeprefix
, it was common to check whether a string startswith
a prefix and then remove it via slicing:
if hex_string.startswith("0x"):
hex_string = hex_string[len("0x"):]
Now you can just use removeprefix
instead:
hex_string = hex_string.removeprefix("0x")
The removeprefix
method is a bit similar to the lstrip
method except that lstrip
removes single characters from the end of a string and it removes as many as it finds.
So while this will remove all leading v
characters from the beginning of a string:
>>> a = "v3.11.0"
>>> a.lstrip("v")
"3.11.0"
>>> b = "3.11.0"
>>> b.lstrip("v")
"3.11.0"
>>> c = "vvv3.11.0"
>>> c.lstrip("v")
"3.11.0"
This would remove at most one v
from the beginning of the string:
>>> a = "v3.11.0"
>>> a.removeprefix("v")
"3.11.0"
>>> b = "3.11.0"
>>> b.lstrip("v")
"3.11.0"
>>> c = "vvv3.11.0"
>>> c.removeprefix("v")
"vv3.11.0"
The removesuffix
method will remove an optional suffix from the end of a string.
>>> time_readings = ["0", "5 sec", "7 sec", "1", "8 sec"]
>>> new_readings = [t.removesuffix(" sec") for t in time_readings]
>>> new_readings
['0', '5', '7', '1', '8']
It does pretty much the same thing as removeprefix
, except it removes from the end instead of removing from the beginning.
I wouldn't memorize these string methods today, but you might consider eventually looking into them.
Method | Related Methods | Description |
---|---|---|
encode |
Encode string to bytes object |
|
find |
rfind |
Return index of substring or -1 if not found |
index |
rindex |
Return index of substring or raise ValueError |
title |
capitalize |
Title-case the string |
partition |
rpartition |
Partition into 3 parts based on a separator |
ljust |
rjust & center |
Left/right/center-justify the string |
zfill |
Pad numeric string with zeroes (up to a width) | |
isidentifier |
Check if string is a valid Python identifier |
Here's why I don't recommend committing each of these to memory:
encode
: you can usually avoid manually encoding strings but you'll discover this method by necessity when you can't (see converting between binary data and strings in Python)find
and rfind
: we rarely care about finding substring indexes: usually it's containment we want (for example we use 'y' in name
instead of name.find('y') != -1
)index
and rindex
: these raise an exception if the given index isn't found, so it's rare to see these methods usedtitle
and capitalize
: the title
method doesn't always work as you'd expect (see Title-casing a string in Python) and capitalize
only capitalizes the first characterpartition
and rpartition
: these can be very handy when splitting while checking whether you split, but I find myself using split
and split
more oftenljust
, rjust
, and center
: these methods left/right/center-justify text and I usually prefer the <
, >
, and ^
string formatting modifiers instead (see formatting strings)zfill
: this method zero-pads strings to make them a specific width and I usually prefer using string formatting for zero-filling as well (see zero-padding while string formatting)isidentifier
: this is niche but useful for checking that a string is a valid Python identifier, though this usually needs pairing with keyword.iskeyword
to exclude Python keywordsThese methods are used for asking questions about your strings.
Most of these ask a question about every character in the string, with the exception of the istitle
method.
Method | Related Methods | Description |
---|---|---|
isdecimal |
isdigit & isnumeric |
Check if string represents a number |
isascii |
Check whether all characters are ASCII | |
isprintable |
Check whether all characters are printable | |
isspace |
Check whether string is entirely whitespace | |
isalpha |
islower & isupper |
Check if string contains only letters |
isalnum |
Check if string contains letters or digits | |
istitle |
Check if string is title-cased |
These methods might be useful in very specific circumstances. But when you're asking these sorts of questions, using a regular expression might be more appropriate.
Also keep in mind that these methods might not always act how you might expect.
All of isdigit
, isdecimal
, and isnumeric
match more than just 0
to 9
and none of them match -
or .
.
The isdigit
method matches everything isdecimal
matches plus more and the isnumeric
method matches everything that isdecimal
matches plus more.
So while only isnumeric
matches ⅷ
, isdigit
and isnumeric
match ⓾
, and all of them match ۸
.
These 5 methods are pretty rare to see:
expandtabs
: convert tab characters into spaces (the number of spaces needed to hit the next 8 character tab stop)swapcase
: convert uppercase to lowercase and lowercase to uppercaseformat_map
: calling my_string.format_map(mapping)
is the same as my_string.format(**mapping)
maketrans
: create a dictionary mapping character code points between keys and values (to be passed to str.translate
)translate
: map all of one code point to another one in a given stringPython's strings have a ton of methods. It's really not worth memorizing them all: save your time for something more fruitful.
While memorizing everything is a waste of time, it is worth committing more useful string methods to memory. If a method would be useful pretty much every week, commit it to memory.
I recommend memorizing Python's most useful string methods, roughly in this order:
join
: Join iterable of strings by a separatorsplit
: Split (on whitespace by default) into list of stringsreplace
: Replace all copies of one substring with anotherstrip
: Remove whitespace from the beginning and endcasefold
(or lower
if you prefer): Return a case-normalized version of the stringstartswith
& endswith
: Check if string starts/ends with 1 or more other stringssplitlines
: Split into a list of linesformat
: Format the string (consider an f-string before this)count
: Count how many times a given substring occursremoveprefix
& removesuffix
: Remove the given prefix/suffixWant to commit all these string methods to long-term memory? I'm working on a system that could help you do that in 5 minutes per day over about 10 days.
This system could also help you commit many other important Python concepts to memory as well.
Want to get early access?
We don't learn by reading or watching. We learn by doing. That means writing Python code.
Practice this topic by working on these related Python exercises.
Sign up for my free 5 day email course and learn essential concepts that introductory courses often overlook: iterables, callables, pointers, duck typing, and namespaces.
Sign up for my 5 day email course and learn essential concepts that introductory courses often overlook!
Sign in to your Python Morsels account to track your progress.
Don't have an account yet? Sign up here.
Sign up for my free 5 day email course and learn essential concepts that introductory courses often overlook: iterables, callables, pointers, duck typing, and namespaces. Learn to avoid beginner pitfalls, in less than a week!
Ready to level up? Sign up now to begin your Python journey the right way!