Understanding the Python Ecosystem

This project focuses on understanding the language ecosystem, not getting into programming details.

Summary

🌄 Python's Habitat

This topic describes how to set up the environment for Python development.

Preparing the Environment for the Python
Check Python Configuration
Advanced settings of Python
What is a virtual environment and how it works

🐍 Python's Taxonomy

This topic describes features of the pattern of Python projects.

Package manager
Requirements file
Deterministic build

💢 Python's Behavior

This topic describes how the language is designed and how it works.

Compiler and interpreter
How Python runs a program
How Python search path module
How Python manages process and threads
How Python manages memory
How to deeply understand Python code execution (debug)

🐛 Python's Feeding

This topic describes static code analysis, formatting patterns and style guides.

Static code analysis
Principal style guides
Knobs (Google YAPF)
My Knobs
Docstrings

🔍 Python's Other Features

Extra topics.

Preparing the Environment for the Python

Linux

Python needs a set of tools that are system requirements. If necessary, install these requirements with this command:

sudo apt update

sudo apt install\
  software-properties-common\
  build-essential\
  libffi-dev\
  python3-pip\
  python3-dev\
  python3-venv\
  python3-setuptools\
  python3-pkg-resources

Now, the environment is done to install Python

sudo apt install python

Windows

On Windows, I recommend using the package manager chocolatey and setting your Powershell to work as admin.

Now, install Python

choco install python

Test

python --version

Check Python Configuration

Check current version

Watch

python --version

Check where installed Python

Watch

which python

Check which Python versions are installed

Watch

sudo update-alternatives --list python

Advanced settings of Python

Install multiples Python versions

Sometimes you might work on different projects simultaneously with different versions of Python. Normally using Anaconda is the easiest solution, however, there are restrictions.

Add repository

Watch

This PPA contains more recent Python versions packaged for Ubuntu.
```
sudo add-apt-repository ppa:deadsnakes/ppa -y
```
Update packages
```
sudo apt update -y
```
Check which Python version is installed
```
python --version
```
Install Python
```
sudo apt install python3.<VERSION>
```

Install multiples Python versions Using Pyenv

Add dependencies
```
sudo apt install curl -y
```
Update packages
```
sudo apt update -y
```
Install pyenv
```
curl https://pyenv.run | bash
```

Add these three lines to .bashrc or .zshrc

export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv virtualenv-init -)"

Open a new terminal and execute
```
exec $SHELL
pyenv --version
```

Change system's Python

Before installing other versions of Python it's necessary to set which system's Python will be used.

Use update-alternatives

It's possible use the update-alternatives command to set priority to different versions of the same software installed in Ubuntu systems. Now, define priority of versions:

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.13 1

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.12 2
 
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 3

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.10 4

In directory /usr/bin will be created a symbolic link: /usr/bin/python -> /etc/alternatives/python*

Choose version

Watch

sudo update-alternatives --config python

Test
```
python --version
```

Change Python2 to Python3

If return Python 2, try set a alias in /home/$USER/.bashrc, see this example.

alias python=python3

NOTE: The important thing to realize is that Python 3 is not backwards compatible with Python 2. This means that if you try to run Python 2 code as Python 3, it will probably break.

Set Python's Environment Variables

PYTHONPATH is an environment variable which you can set to add additional directories where python will look for modules and packages. Example: Apache Airflow read dag/ folder and add automatically any file that is in this directory.
To interpreter PYTHONHOME indicate standard packages.

Set PYTHONPATH

Open profile
```
sudo vim ~/.bashrc
```

Insert Python PATH

export PYTHONHOME=/usr/bin/python<NUMBER_VERSION>

Update profile/bashrc
```
source ~/.bashrc
```

Test

>>> import sys
>>> from pprint import pprint
>>> pprint(sys.path)
['',
 '/usr/lib/python311.zip',
 '/usr/lib/python3.11',
 '/usr/lib/python3.11/lib-dynload',
 '/usr/local/lib/python3.11/dist-packages',
 '/usr/lib/python3/dist-packages']

Example with Apache Airflow

>>> import sys
>>> from pprint import pprint
>>> pprint(sys.path)
['',
 '/home/project_name/dags',
 '/home/project_name/config',
 '/home/project_name/utilities',
 ...
 ]

What is a virtual environment and how it works

Python can run in a virtual environment with isolation from the system.

Image source: https://vincenttechblog.com/fix-change-python-virtualenv-settings/

Architecture of Execution

Virtualenv enables us to create multiple Python environments which are isolated from the global Python environment as well as from each other.

When Python is initiating, it analyzes the path of its binary. In a virtual environment, it's actually just a copy or Symbolic link to your system's Python binary. Next, set the sys.prefix location which is used to locate the site-packages (third party packages/libraries)

Symbolic link

sys.prefix points to the virtual environment directory.
sys.base.prefix points to the non-virtual environment.

Folder of virtual environment

ll

# random.py -> /usr/lib/python3.11/random.py
# reprlib.py -> /usr/lib/python3.11/reprlib.py
# re.py -> /usr/lib/python3.11/re.py
# ...

tree

├── bin
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── pip
│   ├── pip3
│   ├── pip3.11
│   ├── python -> python3.11
│   ├── python3 -> python3.11
│   └── python3.11 -> /usr/bin/python3.11
├── include
├── lib
│   └── python3.11
│       └── site-packages
└── pyvenv.cfg

Create Virtual Environment

Watch

Create virtual environment (using the built-in venv module, recommended since Python 3.3+)

python3 -m venv <NAME_ENVIRONMENT>

Or using the third-party virtualenv package

virtualenv -p python3 <NAME_ENVIRONMENT>

Activate

source <NAME_ENVIRONMENT>/bin/activate

Package manager

uv (recommended)

uv is an extremely fast Python package and project manager written in Rust by Astral (the creators of Ruff). It is 10-100x faster than pip and serves as a single tool that can replace pip, pip-tools, pipx, poetry, pyenv, virtualenv, and more.

Features

Extremely fast dependency resolution and installation (written in Rust)
Drop-in replacement for pip (uv pip install)
Built-in Python version management (uv python install 3.12)
Project management with pyproject.toml and cross-platform lockfile
Virtual environment creation and management
Script execution with inline dependencies
Tool management (replaces pipx)
Workspace support for monorepos
Deterministic builds via uv.lock

Install

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# or via pip
pip install uv

# or via Homebrew
brew install uv

Project workflow

# Create a new project
uv init my-project
cd my-project

# Add dependencies (updates pyproject.toml and uv.lock)
uv add requests flask
uv add --dev pytest ruff

# Sync the virtual environment with the lockfile
uv sync

# Run a command inside the project environment
uv run python main.py
uv run pytest

# Remove a dependency
uv remove flask

# Build and publish
uv build
uv publish

Python version management

# Install a specific Python version
uv python install 3.12

# List available Python versions
uv python list

# Pin a project to a specific version
uv python pin 3.12

pip-compatible interface

# Works as a drop-in replacement for pip
uv pip install requests
uv pip install -r requirements.txt
uv pip freeze
uv pip compile requirements.in -o requirements.txt

pyproject.toml example

[project]
name = "my-project"
version = "0.1.0"
description = "My Python project"
requires-python = ">=3.11"
dependencies = [
    "requests>=2.31",
    "flask>=3.0",
]

[dependency-groups]
dev = [
    "pytest>=8.0",
    "ruff>=0.4",
]

Documentation | GitHub

Pipenv

Create and manage automatically a virtualenv for your projects, as well as adds/removes packages from your Pipfile as you install/uninstall packages. It also generates the ever-important Pipfile.lock, which is used to produce deterministic builds.

Features

Deterministic builds
Separates development and production environment packages into a single file Pipfile
Automatically adds/removes packages from your Pipfile
Automatically create and manage a virtualenv
Check PEP 508 requirements
Check installed package safety

Pipfile X requirements

# Pipfile

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
requests = "*"
numpy = "==1.18.1"
pandas = "==1.0.1"
wget = "==3.2"

[requires]
python_version = "3.11"
platform_system = 'Linux'

# requirements.txt

requests
matplotlib==3.1.3
numpy==1.18.1
pandas==1.0.1
wget==3.2

Install

pip3 install --user pipenv

Create Pipfile and virtual environment

Create environment

Watch
```
pipenv --python 3
```
See where virtual environment is installed
```
pipenv --venv
```
Activate environment
```
pipenv run
```

Install packages with Pipfile

pipenv install flask
# or
pipenv install --dev flask

Create lock file

Watch
```
pipenv lock
```

Python Package Index

Doc Python Package Index

Poetry

Doc Poetry

Conda

Doc Conda

Requirements File

Requirements.txt is file containing a list of items to be installed using pip install.

Principal Commands

Visualize installed packages

pip3 freeze

Generate file requirements.txt

pip3 freeze > requirements.txt

Test

cat requirements.txt

Install packages in requirements

pip3 install -r requirements.txt

Deterministic Build

Using pip and requirements.txt file, have a real issue here is that the build isn’t deterministic. What I mean by that is, given the same input (the requirements.txt file), pip does not always produce the same environment.

pip-tools

A set of command line tools to help you keep your pip-based packages fresh and ensure the deterministic build.

Features

Distinguish direct dependencies and versions
Freeze a set of exact packages and versions that we know work
Make it reasonably easy to update packages
Take advantage of pip's hash checking to give a little more confidence that packages haven't been modified (DNS attack)
Stable

Principal Commands

Install

pip install pip-tools

Get packages's version

pip3 freeze > requirements.in

Generate hashes and list dependencies

pip-compile --generate-hashes requirements.in

output: requirements.txt

Install packages with hash checking

pip-sync requirements.txt

Compiler and interpreter

CPython can be defined as both an interpreter and a compiler.

The compiler converts the .py source file into a .pyc bytecode for the Python virtual machine.
The interpreter executes this bytecode on the virtual machine.

CPython's Design

The principal feature of CPython, is that it makes use of a global interpreter lock (GIL). This is a mechanism used in computer-language interpreters to synchronize the execution of threads so that only one native thread can execute at a time.
Therefore, for a CPU-bound task in Python, single-process multi-thread Python program would not improve the performance. However, this does not mean multi-thread is useless in Python. For a I/O-bound task in Python, multi-thread could be used to improve the program performance.

Multithreading in Python

The Python has multithreads despite the GIL. Using Python threading, we are able to make better use of the CPU sitting idle when waiting for the I/O bound, how memory I/O, hard drive I/O, network I/O.

This can happen when multiple threads are servicing separate clients. One thread may be waiting for a client to reply, and another may be waiting for a database query to execute, while the third thread is actually processing Python code or other example is read multiples images from disk.

NOTE: we would have to be careful and use locks when necessary. Lock and unlock make sure that only one thread could write to memory at one time, but this will also introduce some overhead.

Community Consensus and Free-Threaded Python (PEP 703)

Historically, removing the GIL would have made Python 3 slower in comparison to Python 2 in single-threaded performance. Another problem was that removing the GIL would break existing C extensions which depend heavily on it.
However, PEP 703 was accepted in October 2023, and Python 3.13 (released October 2024) includes experimental free-threaded mode where the GIL can be disabled:

# Run Python 3.13 without the GIL
python3.13t script.py

# Or disable at runtime
PYTHON_GIL=0 python3.13 script.py
python3.13 -X gil=0 script.py

# Check GIL status at runtime
import sys
print(sys._is_gil_enabled())  # Python 3.13+

Key points about free-threaded Python:

The free-threaded build uses python3.13t (the "t" suffix means "threaded")
It is still experimental -- not all C extensions support it yet
Performance varies by workload: some CPU-bound tasks see significant speedups, while single-threaded code may be slightly slower
C extension authors must explicitly opt in via Py_mod_gil slot

How Python runs a program

Tokenize the source code: Parser/tokenizer.c
Parse the stream of tokens into an Abstract Syntax Tree (AST): Parser/parser.c
Transform AST into a Control Flow Graph: Python/compile.c
Emit bytecode based on the Control Flow Graph: Python/compile.c

How Python search path module

When Python executes this statement:

import my_lib

The interpreter searches my_lib.py a list of directories assembled from the following sources:

Current directory
The list of directories contained in the PYTHONPATH environment variable
In directory which Python was is installed. E.g.

The resulting search can be accessed using the sys module:

import sys

sys.path
# ['', '/usr/lib/python311.zip', 
# '/usr/lib/python3.11',
# '/usr/lib/python3.11/lib-dynload',
# '/home/campos/.local/lib/python3.11/site-packages',
# '/usr/local/lib/python3.11/dist-packages',
# '/usr/lib/python3/dist-packages']

Now, to see where a package was imported from you can use the attribute __file__:

import zipp

zipp.__file__
# '/usr/lib/python3/dist-packages/zipp.py'

NOTE: you can see that the __file__ directory is in the list of directories searched by the interpreter.

How Python manages process and threads

CPython uses the Global Interpreter Lock (GIL), which allows only one thread to execute Python bytecode at a time. This has important implications:

CPU-bound tasks: Multi-threading does NOT improve performance due to the GIL. Use multiprocessing or concurrent.futures.ProcessPoolExecutor instead.
I/O-bound tasks: Multi-threading IS effective because threads release the GIL while waiting for I/O (network, disk, etc.). Use threading or concurrent.futures.ThreadPoolExecutor.
Async I/O: For high-concurrency I/O-bound workloads, asyncio provides cooperative multitasking with a single thread.

# CPU-bound: use multiprocessing
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor() as executor:
    results = executor.map(cpu_heavy_func, data)

# I/O-bound: use threading
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor() as executor:
    results = executor.map(io_bound_func, urls)

# Async I/O
import asyncio
async def fetch_all(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [session.get(url) for url in urls]
        return await asyncio.gather(*tasks)

References:

How Python manages memory

Python uses automatic memory management with two key mechanisms:

Reference Counting: Every object has a reference count. When it drops to zero, the memory is immediately freed.
Garbage Collector: Handles circular references that reference counting cannot detect. Uses a generational approach (3 generations) to optimize collection.

import sys

a = [1, 2, 3]
print(sys.getrefcount(a))  # reference count (includes the getrefcount arg itself)

import gc
gc.collect()  # manually trigger garbage collection
print(gc.get_stats())  # stats per generation

Key points:

Small integers (-5 to 256) and interned strings are cached and reused.
Memory pools: CPython uses a private heap with an internal memory allocator (pymalloc) for small objects (< 512 bytes).
Use tracemalloc to trace memory allocations and find leaks.

References:

How to deeply understand Python code execution

Several tools help you trace, debug, and profile Python code:

Debuggers:

pdb (built-in): Insert breakpoint() in your code and step through with n (next), s (step into), c (continue).
PyCharm Debugger: Set breakpoints visually, inspect variables, evaluate expressions.
VS Code Debugger: Similar to PyCharm with the Python extension.

Profilers:

cProfile: Built-in CPU profiler. Run with python -m cProfile -s cumtime script.py.
tracemalloc: Built-in memory profiler. Traces memory allocations to find leaks.
line_profiler: Line-by-line CPU profiling with @profile decorator.

Other Tools:

coverage.py: Measure code coverage of your tests with coverage run -m pytest && coverage report.
PySnooper: Decorator-based tracing that logs every line execution, variable changes, and return values.

# Using pdb
def buggy_function(x):
    breakpoint()  # drops into pdb here
    return x * 2

# Using tracemalloc
import tracemalloc
tracemalloc.start()
# ... your code ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno')[:10]:
    print(stat)

References:

Static code analysis

The static code analysis serves to evaluate the coding. This analysis must be done before submitting for a code review. The static code analysis can check:

Code styling analysis
Comment styling analysis
Error detection
Duplicate code detection
Unused code detection
Complexity analysis
Security linting

The characteristics of a static analysis are:

Provides insight into code without executing it
Can automate code quality maintenance
Can automate the search for bugs at the early stages
Can automate the finding of security problems

A lint, is a static code analysis tool.

Pylint

Pylint is a lint that checks for errors in Python code, tries to enforce a coding standard and looks for code smells. The principal features is:

Pylint follows the PEP 8 style guide.
It's possible automate with Jenkins.
It is fully customizable through a .pylintrc file where you can choose which errors or agreements are relevant to you.

Usage

# Get Errors & Warnings
pylint -rn <file/dir> --rcfile=<.pylintrc>

# Get Full Report
pylint <file/dir> --rcfile=<.pylintrc>

Example execution

Ruff

Ruff is an extremely fast Python linter and formatter written in Rust. It can replace Pylint, Pyflakes, isort, and many other tools in a single pass.

# Lint the whole project
ruff check ./src/

# Lint and auto-fix
ruff check --fix ./src/

# Format (replaces black)
ruff format ./src/

Pyflakes

Documentation Pyflakes

Mypy

Documentation Mypy

Prospector

Documentation Prospector

Other Tools to make an effective Python style guide

Isort

isort is a Python tool/library for sorting imports alphabetically, automatically divided into sections. It is very useful in projects where we deal with a lot of imports [6].

# sort the whole project (isort 5+ traverses directories automatically)
isort ./src/

# just check for errors
isort script.py --check-only

Unify

Someone likes to write them in single quotes, someone in double ones. To unify the whole project, there is a tool that allows you to automatically align with your style guide — unify [6].

unify --in-place -r ./src/

Work recursively for files in the folder.

docformatter

docformatter is a utility that helps bring your docstrings under PEP 257 [6]. The standard specifies how documentation should be written.

docformatter --in-place example.py

Autoformatters

There are also automatic code formatters now, here are the popular ones [6]:

black (you don't need a style guide because you don't have a choice)
ruff format (extremely fast, drop-in replacement for black)
autopep8 (makes your python script conform to PEP 8 style guide)
yapf (customizable style guide; note: relies on lib2to3 which has limited support for Python 3.10+ syntax)

Settings files to text editor and IDE

EditorConfig
Gitattributes

Principal style guides

To make the code consistent and make sure it's readable the style guides can help.

My Knobs

Indentation and Length

4 spaces
Limit all lines to a maximum 72 characters to docstring or comments
Limit all lines to a maximum 79 characters to code

Naming Convention

Class Name (PascalCase): CapWords()
Variables (snake_case): cat_words
Constants: MAX_OVERFLOW

Exception

Limit the clausule try: minimal code necessary.

Yes:

try:
    value = collection[key]
except KeyError:
    return key_not_found(key)
else:
    return handle_value(value)

No:

try:
    # Too broad!
    return handle_value(collection[key])
except KeyError:
    # Will also catch KeyError raised by handle_value()
    return key_not_found(key)

The goal to answer the question "What went wrong?" programmatically rather than just claiming that "There was a problem"

Return

"Should explicitly state this as return None"

Be consistent in return statements.
All return statements in a function should return an expression, or none of them should.

Yes:

def foo(x):
    if x >= 0:
        return math.sqrt(x)
    else:
        return None

No:

def foo(x):
    if x >= 0:
        return math.sqrt(x)

Docstrings

Docstrings must have:

Args
Returns
Raises

Example Google Style Guide

def fetch_bigtable_rows(big_table, keys, other_silly_variable=None):
    """Fetches rows from a Bigtable.

    Retrieves rows pertaining to the given keys from the Table instance
    represented by big_table.  Silly things may happen if
    other_silly_variable is not None.

    Args:
        big_table: An open Bigtable Table instance.
        keys: A sequence of strings representing the key of each table row
            to fetch.
        other_silly_variable: Another optional variable, that has a much
            longer name than the other args, and which does nothing.

    Returns:
        A dict mapping keys to the corresponding table row data
        fetched. Each row is represented as a tuple of strings. For
        example:

        {'Serak': ('Rigel VII', 'Preparer'),
         'Zim': ('Irk', 'Invader'),
         'Lrrr': ('Omicron Persei 8', 'Emperor')}

        If a key from the keys argument is missing from the dictionary,
        then that row was not found in the table.

    Raises:
        IOError: An error occurred accessing the bigtable.Table object.
    """
    return None

References

Name		Name	Last commit message	Last commit date
Latest commit History 448 Commits
config_files		config_files
curso_basico_python_brunocampos01		curso_basico_python_brunocampos01
docs		docs
environment		environment
example_imports		example_imports
exercises		exercises
fundamentals		fundamentals
images		images
logging_config		logging_config
structure_python_files		structure_python_files
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Understanding the Python Ecosystem

Summary

Preparing the Environment for the Python

Check Python Configuration

Check current version

Check where installed Python

Check which Python versions are installed

Advanced settings of Python

What is a virtual environment and how it works

Image source: https://vincenttechblog.com/fix-change-python-virtualenv-settings/

Symbolic link

Folder of virtual environment

Package manager

Features

Install

Project workflow

Python version management

pip-compatible interface

pyproject.toml example

Features

Pipfile X requirements

Install

Create Pipfile and virtual environment

Requirements File

Deterministic Build

pip-tools

Features

Compiler and interpreter

CPython's Design

How Python runs a program

How Python search path module

How Python manages process and threads

How Python manages memory

How to deeply understand Python code execution

Static code analysis

Other Tools to make an effective Python style guide

Principal style guides

My Knobs

Docstrings

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages