auto_research.search.post_processing module

class ArticleOrganizer(source_folder, target_folder='top_articles', threshold_type='rank', score_threshold=0.5, rank_threshold=30, organize_files=True, order_by_score=True, zip_folder=True, plotting=True)[source]

Bases: object

A class to organize, filter, and visualize academic papers based on their combined scores.

This class provides functionality to filter papers by score or rank, organize files into a target folder, and visualize the distribution of paper scores.

Parameters:
  • source_folder (str)

  • target_folder (str)

  • threshold_type (str)

  • score_threshold (float)

  • rank_threshold (int)

  • organize_files (bool)

  • order_by_score (bool)

  • zip_folder (bool)

  • plotting (bool)

source_folder

The folder where the original papers and metadata are stored.

Type:

str

target_folder

The folder where organized papers will be saved.

Type:

str

threshold_type

The filtering method (“rank” or “score”).

Type:

str

score_threshold

The minimum combined score for filtering when threshold_type is “score”.

Type:

float

rank_threshold

The number of top papers to filter when threshold_type is “rank”.

Type:

int

organize_files

Whether to organize files into the target folder.

Type:

bool

order_by_score

Whether to rename files with their combined score.

Type:

bool

zip_folder

Whether to zip the target folder and source folder.

Type:

bool

plotting

Whether to plot the combined scores of papers.

Type:

bool

Example

>>> organizer = ArticleOrganizer(
...     source_folder="papers",
...     target_folder="top_papers",
...     threshold_type="rank",
...     rank_threshold=10,
... )
>>> organizer.organize_and_visualize()
__init__(source_folder, target_folder='top_articles', threshold_type='rank', score_threshold=0.5, rank_threshold=30, organize_files=True, order_by_score=True, zip_folder=True, plotting=True)[source]

Initialize the ArticleOrganizer class with the given parameters.

Parameters:
  • source_folder (str) – The folder where the original papers and metadata are stored.

  • target_folder (str) – The folder where organized papers will be saved. Defaults to “top_articles”.

  • threshold_type (str) – The filtering method (“rank” or “score”). Defaults to “rank”.

  • score_threshold (float) – The minimum combined score for filtering when threshold_type is “score”. Defaults to 0.5.

  • rank_threshold (int) – The number of top papers to filter when threshold_type is “rank”. Defaults to 30.

  • organize_files (bool) – Whether to organize files into the target folder. Defaults to True.

  • order_by_score (bool) – Whether to rename files with their combined score. Defaults to True.

  • zip_folder (bool) – Whether to zip the target folder and source folder. Defaults to True.

  • plotting (bool) – Whether to plot the combined scores of papers. Defaults to True.

Return type:

None

Example

>>> organizer = ArticleOrganizer("papers", "top_papers")
>>> isinstance(organizer, ArticleOrganizer)
True
draw(paper_list, title)[source]

Plot the combined scores of papers and save the plot as an image.

Parameters:
  • paper_list (list[dict]) – A list of dictionaries containing paper details.

  • title (str) – The title of the plot.

Return type:

None

Example

>>> organizer = ArticleOrganizer("papers")
>>> papers = [{"title": "Paper 1", "combined_score": 0.9, "downloaded": True}]
>>> organizer.draw(papers, "Test Plot")
organize_and_visualize()[source]

Organize, filter, and visualize the papers based on the initialized parameters.

Steps: 1. Read metadata from the source folder. 2. Sort papers by combined score in descending order. 3. Draw a plot of the unfiltered papers if plotting is True. 4. Filter papers based on the selected threshold type (“rank” or “score”). 5. Draw a plot of the filtered papers if plotting is True. 6. Organize files into the target folder if required, preventing duplicates. 7. Zip the target folder and source folder if required.

Example

>>> organizer = ArticleOrganizer("papers")
>>> organizer.organize_and_visualize()  # This would process all papers in "papers"
Return type:

None