Skip to main content

File I/O

What's the use of knowing how to write Python code if it doesn't persist beyond the lifetime of the script's operation? After all, the true power of programming doesn't lie in writing fancy loops or creating functions- it's about using those tools to interact with external data and create programs that have a tangible impact. To do this we need a way for Python to interact with other files or create them. This is what file Input/Output (I/O) is.

File I/O allows your Python script to open, read, write, and close files in your computer's file system, allowing your programs to create persistent data, interact with datasets, configure systems, log information, and even communicate with other programs.

Opening Files

The most important part of file I/O is knowing how to open a file. We can do so with the open() function, which allows you to open a file and returns a file object. Here's how you can use it:

file = open('myfile.txt')

We can then assign the file’s contents to a variable and print it directly to the Python console:

file = open('myfile.txt')
file_content=file.read()
print(file_content)

Files opened with open() need to be closed eventually; otherwise they will continue to take up resources on your machine. This can become an issue if the file is very large or if you have many files open. We can do so with the close() function:

file.close()

Python offers a more efficient and error-proof method to handle file I/O with the with statement- a Python keyword that manages data resources. It automatically takes care of opening and closing files, even if any errors occur during file operations:

# The with statement ensures the file is automatically closed after the block's execution, even if errors occur during its execution.
with open('my_file.txt') as file:
file_contents = file.read()
print(file_contents)

File Modes

By default open() opens files in read mode and text mode- the read mode means that we’re opening the file as ‘read-only’ and text mode means that we’re interpreting the file as Python strings.

We can change the modes by using the mode= parameter. For example, here's how it looks if we explicitly open a file in read mode using the mode= parameter:

with open('my_file.txt', mode='r') as file:
file_contents = file.read()
print(file_contents)

Changing the modes controls how the file is opened and what operations we can perform on it. Here's a table that shows the most commonly used file modes and what they allow you to do:

When you specify the mode= parameter in the open() function, you can control how the file is opened and what operations you can perform on it.

Understanding Text and Binary Modes Text Mode ('t'): Interprets the file as text and automatically decodes the content to a Python string using a specific encoding (usually UTF-8). This mode is suitable for files containing textual data.

Binary Mode ('b'): Treats the file as binary data, returning bytes objects without any decoding. This mode is essential for non-text files, such as images or executable files.

Here's an overview of the most commonly used file modes and what they allow you to do. The tables are split between text mode, which interprets the file as a text and automatically decodes the content into a Python string with a specific encoding (usually UTF-8), and binary mode, which interprets the file as a binary data and is essential for non-text files such as images and executables.

Text Modes

ModeExplanation
r (read-only)Open the file to read its content.
w (write-mode)Writes to a file, replacing the existing file or creating it if it doesn’t exist.
a (append-mode)Appends to a file, creating it if it doesn’t exist.
x (exclusive creation mode)Writes to a file, but fails if the file already exists.
t (text mode)Interprets the file as text, and is the default along with r.
r+ (read and write mode)Combination of r and w modes, starts writing from the beginning of the file and overwrites existing text.
w+ (write and read mode)Combination of w and r mode, replacing the existing file.
a+ (append and read mode)Combination of a and r mode, starts appending at the end of the file.

Binary Modes

ModeExplanation
b (binary mode)Opens the file in binary mode. This mode is used for non-text files such as images and executables.
rb (read binary)Opens the file as read-only in binary format. Useful for reading non-text files.
wb (write binary)Opens the file for writing in binary format. It will create a new file or overwrite an existing file.
ab (append binary)Opens the file for appending in binary format. It will create a new file if it doesn’t exist.
rb+ or r+b (read and write binary)Opens the file for both reading and writing in binary format.
wb+ or w+b (write and read binary)Opens the file for both writing and reading in binary format, overwriting the existing file or creating a new one if it doesn’t exist.
ab+ or a+b (append and read binary)Opens the file for both appending and reading in binary format.

File Paths

In order for your Python script to open files it'll need to be able to find them. By default Python will search its local directory (where the python script currently lives in) for any files you're opening.

Alternatively, can then define an absolute or relative file path for the script to search in.

Absolute Path

An absolute path is the complete directory path from the root of the filesystem to the target file. Using absolute paths ensures that Python can locate the file no matter where your script runs. However, it also makes your code less portable, as the path might not exist on other systems or environments, and you'd need to go and replace the path whenever you move the script.

Here's an example of an absolute path on Windows:

with open('C:\\Users\\Username\\Documents\\my_file.txt', mode='w') as file:
file.write('Hello!')

Relative Path

A relative path specifies the file location in relation to the directory where the Python script is running. Relative paths make your code more portable, assuming the directory structure relative to the script remains constant.

Here's an example of using a relative path on Windows:

file_path = 'C:\\Users\\Username\\Documents\\my_file.txt'
with open(file_path, mode='w') as file:
file.write('Hello!')

Operating Systems and Paths

Unfortunately, the task of creating Python programs that work across various Operating Systems (OS) and softwares can be quite complicated due to

Unfortunately, the task of creating Python programs that are compatible with different operating systems (OS) and softwares can be quite complicated due to certain core differences. One such difference is in how they handle file paths. Window paths use backslashes \, while Unix-based systems like Linux and macOS use forward slashes /. Then there's the number of slashes to use, differences in handling root directories, case sensitivity, and more.

Fortunately, Python provides built-in tools and techniques that help abstract away many of these OS-specific complexities, making file I/O operations more uniform across different environments.

The os module

To address path compatibility issues, Python's standard library includes the os module, which allows you to create OS-agnostic file paths by using the os.path.join() function. This function automatically detects and uses the correct file path separator for the OS on which the Python script is running, allowing your script to be run on any OS without conflicts.

Managing paths with os

Example of os.path.join on Windows:

import os

# Automatically uses backslashes (\) on Windows
file_path = os.path.join('my_directory', 'my_file.txt')
print(file_path) # Outputs: my_directory\my_file.txt

And on Linux:

import os

# Automatically uses backslashes (/) on Linux
file_path = os.path.join('my_directory', 'my_file.txt')
print(file_path) # Outputs: my_directory/my_file.txt

The os module can accomplish many other tasks, most of them beyond the scope of this tutorial, but here are two basic functions that may be immediately useful:

Creating a new directory with os

import os

# Creating a new directory
os.mkdir('new_folder')

# Creating a directory tree
os.makedirs('path/to/new_folder', exist_ok=True)

Setting up and retrieving environment variables with os

Environment variables are the settings that your Python script may use to interact with the operating system environment, such as file paths, system configurations, and user preferences. The os module allows you to access and modify these variables:

Retrieving Environment Variables You can retrieve an environment variable using os.environ.get(). This method returns the value of the specified environment variable if it exists, or None if it does not.

import os

# Retrieving an environment variable (e.g., PATH)
path_variable = os.environ.get('PATH')
print(path_variable)

Setting Environment Variables To set an environment variable for the duration of your script, you can use os.environ directly like a dictionary. Note that changes made to environment variables using os.environ only affect the current process (your Python script) and its child processes; they do not persist after the script ends.

import os

# Setting an environment variable
os.environ['MY_CUSTOM_VAR'] = 'SomeValue'
print(os.environ.get('MY_CUSTOM_VAR'))

The pathlib module

The pathlib module goes a step further than os, giving you a path object that has its own methods and properties.This helps make your code cleaner and more Python, and removes the syntax differences that you may need with the os module.

Basic Usage Here's a basic example of how to use the pathlib module to create a Path object and perform common file path operations:

from pathlib import Path

# Creating a Path object in the current directory
current_directory = Path('.')

# Listing all files and directories in the current directory
for item in current_directory.iterdir():
print(item)

# Creating a new directory
new_directory = current_directory / 'new_folder'
new_directory.mkdir(exist_ok=True)

# Creating a Path object for a new file in the new directory
new_file = new_directory / 'my_new_file.txt'

# Writing to the file
new_file.write_text('Hello, pathlib!')

# Reading from the file
print(new_file.read_text())

from pathlib import Path

# Creating a Path object for file pathing
file_path = Path('my_directory') / 'my_file.txt'
print(file_path) # On Windows: my_directory\my_file.txt; On Unix-based systems: my_directory/my_file.txt

Conclusion

This tutorial just briefly touched upon the complexities of file I/O in Python, from the essential techniques for reading from and writing to files, handling different file modes, and working with file paths across different OSes.

File I/O can become extremely complicated due to requirements of various applications, the need for comprehensive error handling, and the intricacies of file handling. To make this journey more manageable, here are a couple of tips:

  • Error Handling: Make coding in robust error handling a habit. One one is to use try-except blocks to catch and handle exceptions that may arise during file operations. This practice is vital for building resilient applications that can handle unexpected situations gracefully.

  • Explore Python's I/O Libraries: Python offers a host of specialized modules for opening files. Examples include- csv for reading and writing CSV files and json for JSON data. Each library is designed to simplify working with specific data formats, so be sure to check before you begin!

  • Security Practices: Be cautious about the files your program opens, especially if the file paths or contents come from an external source. Validate file paths and handle user-generated data carefully to avoid security vulnerabilities, such as path traversal attacks.

  • Utilize Resources and Community: The Python community is vast and supportive. The most popular modules and libraries tend to have online resources- such as forums, documentation, and code repositories- that cna help you. Knowing how to google information is a valuable skill with Python, so always go search if you're stuck with something!