auto_research.search.post_processing module

class ArticleOrganizer(source_folder, target_folder='top_articles', threshold_type='rank', score_threshold=0.5, rank_threshold=30, organize_files=True, order_by_score=True, zip_folder=True, plotting=True)[source]

Bases: object

A class to organize, filter, and visualize academic papers based on their combined scores.

Parameters:
  • source_folder (str)

  • target_folder (str)

  • threshold_type (str)

  • score_threshold (float)

  • rank_threshold (int)

  • organize_files (bool)

  • order_by_score (bool)

  • zip_folder (bool)

  • plotting (bool)

source_folder

The folder where the original papers and metadata are stored.

Type:

str

target_folder

The folder where organized papers will be saved.

Type:

str

threshold_type

The filtering method (“rank” or “score”).

Type:

str

score_threshold

The minimum combined score for filtering when threshold_type is

Type:

float

"score".
rank_threshold

The number of top papers to filter when threshold_type is “rank”.

Type:

int

organize_files

Whether to organize files into the target folder.

Type:

bool

order_by_score

Whether to rename files with their combined score.

Type:

bool

zip_folder

Whether to zip the target folder and source folder.

Type:

bool

plotting

Whether to plot the combined scores of papers.

Type:

bool

__init__(source_folder, target_folder='top_articles', threshold_type='rank', score_threshold=0.5, rank_threshold=30, organize_files=True, order_by_score=True, zip_folder=True, plotting=True)[source]

Initialize the ArticleOrganizer class with the given parameters.

Parameters:
  • source_folder (str) – The folder where the original papers and metadata are stored.

  • target_folder (str) – The folder where organized papers will be saved. Defaults to

  • "top_articles".

  • threshold_type (str) – The filtering method (“rank” or “score”). Defaults to “rank”.

  • score_threshold (float) – The minimum combined score for filtering when threshold_type

  • "score". (is) – Defaults to 0.5.

  • rank_threshold (int) – The number of top papers to filter when threshold_type is “rank”. Defaults to 30.

  • organize_files (bool) – Whether to organize files into the target folder. Defaults to

  • True.

  • order_by_score (bool) – Whether to rename files with their combined score. Defaults to

  • True.

  • zip_folder (bool) – Whether to zip the target folder and source folder. Defaults to

  • True.

  • plotting (bool) – Whether to plot the combined scores of papers. Defaults to True.

Return type:

None

draw(paper_list, title)[source]

Plot the combined scores of papers and save the plot as an image.

Parameters:
  • paper_list (List[Dict]) – A list of dictionaries containing paper details.

  • title (str) – The title of the plot.

Return type:

None

organize_and_visualize()[source]

Organize, filter, and visualize the papers based on the initialized parameters.

Steps: 1. Read metadata from the source folder. 2. Sort papers by combined score in descending order. 3. Draw a plot of the unfiltered papers if plotting is True. 4. Filter papers based on the selected threshold type (“rank” or “score”). 5. Draw a plot of the filtered papers if plotting is True. 6. Organize files into the target folder if required. 7. Zip the target folder and source folder if required.

Return type:

None