auto_research.search.files_management module

sanitize_filename(filename)[source]

Sanitizes a filename by removing illegal characters that are not allowed in Windows filenames.

Parameters:

filename (str) – The original filename to be sanitized.

Returns:

The sanitized filename with illegal characters removed and leading/trailing spaces

stripped.

Return type:

str

Example

>>> sanitize_filename("my/file:name?.txt")
'myfilename.txt'
is_pdf_uncorrupted(file_path)[source]

Checks if a PDF file is uncorrupted by attempting to open it using the fitz library.

Parameters:

file_path (str) – The path to the PDF file to be checked.

Returns:

True if the PDF is not corrupted and can be opened successfully, False otherwise.

Return type:

bool

Example

>>> is_pdf_uncorrupted("example.pdf")
True
>>> is_pdf_uncorrupted("corrupted.pdf")
Error opening PDF: <error message>
False

Notes

This function uses the fitz library (PyMuPDF) to open the PDF file. If the file cannot

be opened,

it is assumed to be corrupted, and the function returns False.