auto_research.search.post_processing module
- class ArticleOrganizer(source_folder, target_folder='top_articles', threshold_type='rank', score_threshold=0.5, rank_threshold=30, organize_files=True, order_by_score=True, zip_folder=True, plotting=True)[source]
Bases:
object
A class to organize, filter, and visualize academic papers based on their combined scores.
This class provides functionality to filter papers by score or rank, organize files into a target folder, and visualize the distribution of paper scores.
- Parameters:
- score_threshold
The minimum combined score for filtering when threshold_type is “score”.
- Type:
Example
>>> organizer = ArticleOrganizer( ... source_folder="papers", ... target_folder="top_papers", ... threshold_type="rank", ... rank_threshold=10, ... ) >>> organizer.organize_and_visualize()
- __init__(source_folder, target_folder='top_articles', threshold_type='rank', score_threshold=0.5, rank_threshold=30, organize_files=True, order_by_score=True, zip_folder=True, plotting=True)[source]
Initialize the ArticleOrganizer class with the given parameters.
- Parameters:
source_folder (str) – The folder where the original papers and metadata are stored.
target_folder (str) – The folder where organized papers will be saved. Defaults to “top_articles”.
threshold_type (str) – The filtering method (“rank” or “score”). Defaults to “rank”.
score_threshold (float) – The minimum combined score for filtering when threshold_type is “score”. Defaults to 0.5.
rank_threshold (int) – The number of top papers to filter when threshold_type is “rank”. Defaults to 30.
organize_files (bool) – Whether to organize files into the target folder. Defaults to True.
order_by_score (bool) – Whether to rename files with their combined score. Defaults to True.
zip_folder (bool) – Whether to zip the target folder and source folder. Defaults to True.
plotting (bool) – Whether to plot the combined scores of papers. Defaults to True.
- Return type:
None
Example
>>> organizer = ArticleOrganizer("papers", "top_papers") >>> isinstance(organizer, ArticleOrganizer) True
- draw(paper_list, title)[source]
Plot the combined scores of papers and save the plot as an image.
- Parameters:
- Return type:
None
Example
>>> organizer = ArticleOrganizer("papers") >>> papers = [{"title": "Paper 1", "combined_score": 0.9, "downloaded": True}] >>> organizer.draw(papers, "Test Plot")
- organize_and_visualize()[source]
Organize, filter, and visualize the papers based on the initialized parameters.
Steps: 1. Read metadata from the source folder. 2. Sort papers by combined score in descending order. 3. Draw a plot of the unfiltered papers if plotting is True. 4. Filter papers based on the selected threshold type (“rank” or “score”). 5. Draw a plot of the filtered papers if plotting is True. 6. Organize files into the target folder if required, preventing duplicates. 7. Zip the target folder and source folder if required.
Example
>>> organizer = ArticleOrganizer("papers") >>> organizer.organize_and_visualize() # This would process all papers in "papers"
- Return type:
None