· coding · 15 min read
Python Deep Dive
Python code reference and cookbook.

Table of Contents - Python
- Table of Contents - Python
- Poetry and Pyenv: Setting up a Professional Python Development Environment
- Environment variables and
python-dotenv
- Standard Library
- Writing Tests with
pytest
- Working with Databases
- Discord.py
- Object-Oriented Programming (OOP)
- Else
- Publishing Packages on PyPi
- Protocol Buffers
- Miscellaneous
Poetry and Pyenv: Setting up a Professional Python Development Environment
Environment variables and python-dotenv
Standard Library
File Handling - Reading and Writing Files
The open()
function returns a file object.
File objects mediate access to on-disk files. File objects are also called streams.
Q: file_name (str)
is the name of a file. How do you read the file line by line in Python?
with open(file=file_name, mode="r") as file:
for idx, line in enumerate(file):
pass # line (str): A line in the file
The two arguments of open()
are ‘file’ and ‘mode’.
Modes of open()
:
- Mode
'a'
: For appending data to the end of ‘file’. - Mode
'w'
: For writing only. An existing file with the same name as ‘file’ will be erased. - Mode
'r+'
: opens a file for both reading and writing.
Q: Why use the with
keyword with the open
method?
It is good practice to use the with
keyword when dealing with file objects because the file will be properly closed after its suite finishes even if an exception is raised at some point. Using with
is also much shorter than writing equivalent try-finally
blocks.
with open(file="some_file") as file:
pass
file.closed # returns True
Q: Creating a blank file.
# Creates blank.txt
# some_path represents a path to some existing directory.
import os
path_to_file = os.path.join(some_path, "blank.txt")
with open(path_to_file, "w") as fh:
pass
# Verify this worked with...
print(f"Directories and files at {some_path}: {os.listdir(some_path)}")
- Python glossary - file object [docs]
Date: 21 年 6 月
HTTP Requests
pip install requests
Hypertext Transfer Protocol (HTTP) is a request-response protocol commonly used for sending data through web browsers.
GET
The GET request is for retrieving data without altering its state.
url: str
response: requests.Response = requests.get(url)
POST
POST
The two simplest kinds of
Command Line Applications - Argparse
Functools
Python’s functools module is used for higher-order functions and operations on callable objects. It’s a part of the standard library
Date: 21 年 8 月
Sort() and sorted()
import dataclasses
@dataclasses.dataclass
class Student:
name: str
grade: str
age: int
def __repr__(self):
return repr((self.name, self.grade, self.age))
students = [Student('John', 'A', 15),
Student('Jane', 'B', 12),
Student('Dave', 'B', 10),]
Q: Sort the students by age in ascending order.
sorted(students, key = lambda student: student.age)
Q: Sort the students by age in descending order.
sorted(students, key = lambda student: student.age, reverse=True)
Q: What is sorting stability? → Stable sorting algorithms maintain the relative order of records with equal keys, sorting repeated elements in the same order that they appear in the input.
Cloze:
- Python sorts are guaranteed to be stable.
- Because of this, when multiple records have the same key, their original order is preserved.
Q: Sort the students by descending grade and then ascending age.
students = sorted(students, key=lambda s: s.age)
students = sorted(students, key=lambda s: s.grade, reverse=True)
Binary and other bases
Q: Return a binary representation of the given number.
def int2binary(i: int) -> str:
...
→
def int2binary(i: int) -> str:
return bin(i)
>>> int2binary(3)
'0b11'
Q: A binary representation string has “0b” at the front. If you have an integer, i
, and want its binary representation without the “0b”, how can you return that efficiently?
i: int
>>> f"{i:b}"
>>> format(i, "b") # equivalent
See https://www.programiz.com/python-programming/methods/built-in/format
Q:
def add_binary(a: str, b: str):
...
Given two binary strings, a
and b
, return their sum as a binary string. As binary strings, a
and b
consist only of “0” and “1” characters.
def add_binary(a: str, b: str) -> str:
a: int = int(a, base=2)
b: int = int(b, base=2)
return f"{a + b:b}"
# return format(a + b, "b") # equivalent
Q: Return the index of the first occurence of ‘needle’ in ‘haystack’, or return -1 if ‘needle’ is not part of ‘haystack’. Return 0 if ‘needle’ is an empty string.
def find_needle(haystack: str, needle: str) -> int:
...
def find_needle(haystack: str, needle: str) -> int:
if needle == "":
return 0
str_len: int = len(needle)
for i in range(len(haystack) - str_len + 1):
if haystack[i:i + str_len] == needle:
return i
return -1
Q: How do you combine an iterable of strings into one string in Python? A: Use s.join(txt)
, where s
is a string that specifies how to join and txt
is the iterable of strings.
Q: Return the strings combined into one.
text = ['Stay', 'gold,', 'Ponyboy.']
>>> " ".join(text)
'Stay gold, Ponyboy.'
Q: Implement a function to find the longest common prefix string amongst an array of strings. If there is no common prefix, return an empty string.
def longest_common_prefix(strs: List[str]) -> str:
...
A:
def longest_common_prefix(strs: List[str]) -> str:
member: str = strs[0]
common_prefix: List[str] = []
max_idx: int = min([len(s) for s in strs])
for i, char in enumerate(member):
if i == max_idx:
break
if all([char == s[i] for s in strs]):
common_prefix.append(char)
else:
break
if len(common_prefix) == 0:
return ""
else:
return "".join(common_prefix)
Q: Implement a function that reverses an array of strings in-place without creating another array.
def reverse(strs: List[str]) -> None:
...
A:
def reverse(strs: List[str]) -> None:
stop_idx = len(strs) // 2
for i, s in enumerate(strs):
if i == stop_idx:
break
back_s = strs[-(i + 1)]
strs[i] = back_s
strs[-(i + 1)] = s
Display number in scientific notation.
def num_in_scientific_notation(num: float) -> str:
return "{:e}".format(num)
Writing Tests with pytest
Fixtures
import pytest
@pytest.fixture
def foo_obj() -> SomeType:
foo_val: SomeType
// ...
return foo_val
Other tests can access this fixture by specifying it as a required argument.
def test_a(foo_obj):
... // test logic here
def test_b(foo_obj):
... // test logic here
Choosing which tests to run in a suite with -k
When working with pytest, you often want to run a specific subset of your tests. One of the ways to filter and select which tests to run is by using the -k
option followed by an expression.
The -k
option allows you to run tests by their names. It matches the test names and any substring of names based on the expression you provide.
Examples:
Run all tests containing the substring “arg”
pytest -k arg
NOT: Run all tests that do NOT contain “arg”
pytest -k "not arg"
AND: Run all tests that contain “A” and “B”
pytest -k "A and B"
OR: Run all tests that contain either “A” or “B” (inclusive or)
pytest -k "A or B"
Else: Testing
Disabling warnings
To ignore pytest warnings at the command line: pytest -p no:warnings
See: https://docs.pytest.org/en/latest/how-to/capture-warnings.html#disabling-warnings-summary
Pytest Mock
The standard solution to creating mock objects in python is the unittest.mock
library. This section goes over how to create mocks in pytest with pytest-mock
Working with Databases
MongoDB (pymongo)
Pymongo assumes MongoDB is installed and running on the default host and port. If this is not the case, go to: https://docs.mongodb.com/manual/installation/ .
You’ll notice there are two installation types to choose from: community and enterprise edition. According to a thread on the mongodb forums,
“Core server features for developers are generally the same, but a MongoDB Enterprise subscription includes additional operational and management features, a commercial license (warranty & idemnification), as well as access to proactice support and on-demand training.
In my case, I only needed the community edition.
There are two main steps for this installation.
- Install MondoDB Server: [installation instructions (Windows)]
- Install MongoDB Shell (mongosh): [installation instructions]
The installation instructions for Community Edition are toward the bottom of the page (at the above link) where it says “Procedure”. You’re finished with step 1 when you can start MongoDB with"C:\Program Files\MongoDB\Server\5.0\bin\mongod.exe" --dbpath="c:\data\db"
- Connect to a MongoDB deployment using the mongosh [docs].
At this point you should be able to type mongosh
at the prompt and see the MongoDB Shell run. However, the commands mongo
, mongod
, and mongos
won’t run on Windows. To enable these commands, you need to add shortcuts to each command’s corresponding executable to the system’s environment variables.
Hit the Windows key, then type environment variables and hit enter. Alternatively, open the Control Panel and select “Edit the system environment variables”.
Under the “Advanced” tab, select
Environment Variables
.Under “System Variables”, select
New
.Add each command’s name to one field its executable path to the other field. For me,
mongo
’s executable was at"C:\Program Files\MongoDB\Server\5.0\bin\mongo.exe"
The other two executables were in the same directory.
Now, mongo
and mongod
work, but what do they do?
mongo
: Opens the Mongo Shell, an interactive command line interface that uses JavaScript and a related API to interact with the MongoDB client. Mongo shell is the default client for the MongoDB database server.
- Used for data manipulation
- Used for management of database instances
mongod
: Manages all of the MongoDB server tasks. mongod
is a background process used by MongoDB. It’s short for “Mongo Daemon”.
References MongoDB:
- Jayatilake, Navindu. 2019. How to get started with MongoDB in 10 minutes. [article]
- Pymongo tutorial
Discord.py
To interface with Discord programmatically, you’ll need a Discord token.
- Head to https://discord.com/login
- Open the browser’s developer console with either
Ctrl + Shift + I
or by right clicking. - Refresh the page
- Go under the “Network” tab in the console.
- Enter “/api” as the filter and then hit
Enter
. - One of the items in the table should have a name starting with “library”. Click this row.
- In the “Headers” tab, find the place that says “authorization: “. The text for this field is your Discord token.
Resources
- API Reference. discord.py
- Channel history in discord.py
- Webhook Resource. Discord Developer Portal.
- Discord.py add role command. Stack Overflow.
Object-Oriented Programming (OOP)
Abstract Base Classes
An abstract base class specifies the interface that a class should adhere to and acts as an agreement between different parts of a program.
Q: Import the abstract base class and abstract method.
from abc import ABC, abstractmethod
Q: Define a short abstract base class, “HelloWorld”, with an abstract method called “hello” that should return a string.
from abc import ABC, abstractmethod
class HelloWorld(ABC):
@abstractmethod
def hello(self) -> str:
pass
Attempting to create an instance of an abstract base class (ABC) raises an error. You can, however, create subclasses that inherit from the ABC. An ABC purely defines the kind of methods that a class that inherits from the ABC should have.
# Assume the ABC, HelloWorld, is implemented in the same file.
class JapaneseHelloWorld(HelloWorld):
def hello(self) -> str:
return "こんにちは、世界!"
class ChineseHelloWorld(HelloWorld):
def hello(self) -> str:
return "你好世界!"
Attempting to create an instance of one of the concrete classes, “JapaneseHelloWorld” and “ChineseHelloWorld”, without implementing the “hello” method would raise an error too.
Data classes
Python’s dataclass
functionality allows you to write shorter code and initialize, print, compare, and order data much more easily.
Why use data classes?
A class for a data container is often used differently. Many instances may need to be made. You probably want to order them, compare them, easily inspect the data inside them, etc.
Regular classes don’t provide much built-in convenient functionality for data oriented classes. Classes that build around a behavior may not have as many instances or be conducive to representing data structures. Because of this, some programming languages provide a type of class that is better tuned to storing data, e.g. C#‘s struct type that is good for representing data structures.
Simplest data class: All required arguments
Data classes have a built-in initialize that will help you quickly fill in an object with data. There are easy ways to print compare and order them.
# Regular class
class Person:
"""A person that has a name, job, and age."""
name: str
job: str
age: int
def __init__(self, name, job, age):
self.name = name
self.job = job
self.age = age
p1 = Person(name='Joe', job='SDE', age=25)
Q: Implement this as a dataclass
:
from dataclasses import dataclass
@dataclass
class Person:
"""A person that has a name, job, and age."""
name: str
job: str
age: int
Data class with default arguments
Let’s say all of our people usually have the same job title, which is software development engineer (“SDE”). We’d want to set this as a default parameter to save some typing.
from dataclasses import dataclass
@dataclass
class Person:
"""A person that has a name, job, and age."""
name: str
job: str = "SDE"
age: int
p1 = Person(name='Joe', age=25)
Data class with attributes that depend on other parameters
Dynamically generated attributes can be defined in data classes using the field
functionality in combination with __post_init__
.
For example, let’s say that the ‘age’ parameter automatically specifies a job if it’s low enough. Otherwise, ‘job’ will default to “SDE” like before.
from dataclasses import dataclass, field
@dataclass
class Person:
"""A person that has a name, job, and age."""
name: str
job: str = field(default = "SDE", init = False)
age: int
def __post_init__(self):
age = self.age
if (self.job == "SDE") and (age <= 18):
if age >= 14 and age <= 18:
self.job = "Student - high school"
elif age >= 12 and age <= 14:
self.job = "Student - middle school"
elif age >= 5 and age <= 12:
self.job = "Student - elementary school"
else:
self.job = "Baby, toddler, or pre-K"
Reference: ArjanCodes. 2021. If you’re not using Python DATA CLASSES yet, you should. 🚀 [YouTube]
Prefer composition over inheritance
Inheritance makes it difficult to duplicate functionality to different classes.
With composition, our goal is to separate out concepts so that they can be combined in meaningful ways. We’re not creating hierarchies of classes.
If we have a class that we want to make subclasses of, it’s beneficial to make the first one an abstract base class.
from abc import ABC, abstractmethod
from typing import Optional
class FFNN(ABC):
"""Represents a PyTorch module for a feed-forward neural network."""
@abstractmethod
def forward()
"""TODO"""
Reference: ArjanCodes. Why COMPOSITION is better than INHERITANCE - detailed Python example. 2021. [YouTube]
NamedTuple
and TypedDict
PEP 589 – TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys
Representing an object or structured data using (potentially nested) dictionaries with string keys (instead of a user-defined class) is a common pattern in Python programs. Representing JSON objects is perhaps the canonical use case, and this is popular enough that Python ships with a JSON library. This PEP proposes a way to allow such code to be type checked more effectively.
from typing import TypedDict
class Movie(TypedDict):
name: str
year: int
# a type checker should accept this code:
movie: Movie = {'name': 'Blade Runner',
'year': 1982}
Else
Publishing Packages on PyPi
References:
- https://packaging.python.org/en/latest/tutorials/packaging-projects/
- Publishing (Perfect) Python Packages on PyPi. 2020. https://youtu.be/GIF3LaRqgXo
Suppose you’re in the root directory of a GitHub repository.
Create a Python package as a subdirectory of src
in the repo’s root. We’ll call this pkg
(repo/src/pkg).
Create a blank module called “setup.py” (repo/setup.py), then open it up in your favorite text editor.
import setuptools
setuptools.setup(
name="pkg-pypi-name", # The name on PyPi. What you pip install.
version="0.0.1",
description=("A Python package description"),
package_dir={'': 'src'},
)
Think of each line of the setup()
method as configuration for the package.
Arguments of setuptools.setup
:
name
: The name on PyPi. This is the “x” in “pip install x”.version
: Version number for the package. 0.0.x version numbers imply that the package is unstable. It’s good to start out with an unstable version number before publishing the package so that the people won’t see failing code flags just because there’s some packaging issue.package_dir
: A map (dictionary) that tells setuptools the name of the package’s parent directory.
python3 -m pip install —upgrade build
Now, run python setup.py bdist_wheel
. This will create build
, build/lib
, and dist
directories inside the repository.
dist
: Houses the outputted wheel file.- Running the
bdist_wheel
also creates asrc/pkg.egg-info
directory. You should git ignore this.
Testing the local package install.
pip install -e .
Check out gitignore.io for standard ignores for common frameworks.
Try running:
python -m twine upload --repository testpypi dist/* --verbose
If the command fails, you’ll need to make source code or configuration changes and re-run python -m build
in the same directory as your pyproject.toml
. Then, you can keep checking the “twine” command.
After this command runs successfully, you should be able to view your published packaged on test.pypi.org and install it locally with
python -m pip install --index-url https://test.pypi.org/simple/ --no-deps your-package-name
To uninstall this local installation, run
python -m pip uninstall your-package-name
Protocol Buffers
Terms to know: .proto file, protocol buffer compiler, protcol buffer API, messages of a protocol buffer API
Protocol buffers are the flexible, efficient, automated solution to solve the problem of how to serialize and retrieve structured data between programming interfaces, e.g. C++ and Python.
How does a protocol buffer solve this problem?
A protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides attributes and fields for the objects that make up a protocol buffer, abstracting away the complexity of reading and writing the protocol buffer as a unit.
- The protocol buffer format supports future development efforts by making it easy to extend the format over time in such that the code can still read data encoded with an old format.
Protocol Format
References:
- Google. Protocol Buffer Basics: Python [docs]
Miscellaneous
Google Code Styling with YAPF
Install YAPF via pip: pip install yapf
To see usage instructions: yapf --help
The main command you’ll use is yapf -i [python-file]
Writing to Excel Files with XlsxWriter
Installation
- Conda install
conda install -c conda-forge XlsxWriter
- Pip install
pip install XlsxWriter
# hello-xlsx-writer.py
import xlsxwriter
workbook = xlsxwriter.Workbook('hello.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write('A1', 'Hello world')
workbook.close()
References:
- https://xlsxwriter.readthedocs.io/
- https://xlsxwriter.readthedocs.io/tutorial02.html
- https://stackoverflow.com/questions/16560289/using-python-write-an-excel-file-with-columns-copied-from-another-excel-file
ML Finance Project
example w/ multivariate time series in PyTorch
Q: Neural networks can be constructed using the torch.nn
Python module.
Q: Import the package for constructing neural networks in PyTorch.
import torch.nn as nn
Q: Seaborn comes with built-in datasets.
Q: Load seaborn’s flights dataset.
flight_data = sns.load_dataset("flights")
Q: Why must time series data be scaled for sequence predictions? : When a network is fit on unscaled data, it is possible for large inputs to slow down the learning and convergence of your network and in some cases prevent the network from effectively learning your problem.
Q: What’s the sklearn import for min-max scaling data?
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler()
We know the field is fast moving. If the reader looking for more recent free reading resources, there are some good introductory/tutorial/survey papers on Arxiv