Skip to content

Parser

Parser module for parsing torrent titles and extracting metadata using RTN patterns.

The module provides functions for parsing torrent titles, extracting metadata, and ranking torrents based on user preferences.

Functions: - parse: Parse a torrent title and enrich it with additional metadata.

Classes: - Torrent: Represents a torrent with metadata parsed from its title and additional computed properties. - RTN: Rank Torrent Name class for parsing and ranking torrent titles based on user preferences.

Methods - rank: Parses a torrent title, computes its rank, and returns a Torrent object with metadata and ranking.

For more information on each function or class, refer to the respective docstrings.

RTN

RTN (Rank Torrent Name) class for parsing and ranking torrent titles based on user preferences.

Parameters:

Name Type Description Default
`settings` SettingsModel

The settings model with user preferences for parsing and ranking torrents.

required
`ranking_model` BaseRankingModel

The model defining the ranking logic and score computation.

required
Notes
  • The settings and ranking_model must be provided and must be valid instances of SettingsModel and BaseRankingModel.
  • The lev_threshold is calculated from the settings.options["title_similarity"] and is used to determine if a torrent title matches a correct title.
Example
from RTN import RTN
from RTN.models import SettingsModel, DefaultRanking

settings_model = SettingsModel()
ranking_model = DefaultRanking()
rtn = RTN(settings_model, ranking_model)
Source code in RTN/parser.py
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
class RTN:
    """
    RTN (Rank Torrent Name) class for parsing and ranking torrent titles based on user preferences.

    Args:
        `settings` (SettingsModel): The settings model with user preferences for parsing and ranking torrents.
        `ranking_model` (BaseRankingModel): The model defining the ranking logic and score computation.

    Notes:
        - The `settings` and `ranking_model` must be provided and must be valid instances of `SettingsModel` and `BaseRankingModel`.
        - The `lev_threshold` is calculated from the `settings.options["title_similarity"]` and is used to determine if a torrent title matches a correct title.

    Example:
        ```python
        from RTN import RTN
        from RTN.models import SettingsModel, DefaultRanking

        settings_model = SettingsModel()
        ranking_model = DefaultRanking()
        rtn = RTN(settings_model, ranking_model)
        ```
    """

    def __init__(self, settings: SettingsModel, ranking_model: BaseRankingModel):
        """
        Initializes the RTN class with settings and a ranking model.

        Args:
            `settings` (SettingsModel): The settings model with user preferences for parsing and ranking torrents.
            `ranking_model` (BaseRankingModel): The model defining the ranking logic and score computation.

        Raises:
            ValueError: If settings or a ranking model is not provided.
            TypeError: If settings is not an instance of SettingsModel or the ranking model is not an instance of BaseRankingModel.

        Example:
            ```python
            from RTN import RTN
            from RTN.models import SettingsModel, DefaultRanking

            settings_model = SettingsModel()
            ranking_model = DefaultRanking()
            rtn = RTN(settings_model, ranking_model, lev_threshold=0.94)
            ```
        """
        self.settings = settings
        self.ranking_model = ranking_model
        self.lev_threshold = self.settings.options.get("title_similarity", 0.85)

    def rank(self, raw_title: str, infohash: str, correct_title: str = "", remove_trash: bool = False, **kwargs) -> Torrent:
        """
        Parses a torrent title, computes its rank, and returns a Torrent object with metadata and ranking.

        Args:
            `raw_title` (str): The original title of the torrent to parse.
            `infohash` (str): The SHA-1 hash identifier of the torrent.
            `correct_title` (str): The correct title to compare against for similarity. Defaults to an empty string.
            `remove_trash` (bool): Whether to check for trash patterns and raise an error if found. Defaults to True.

        Returns:
            Torrent: A Torrent object with metadata and ranking information.

        Raises:
            ValueError: If the title or infohash is not provided for any torrent.
            TypeError: If the title or infohash is not a string.
            GarbageTorrent: If the title is identified as trash and should be ignored by the scraper, or invalid SHA-1 infohash is given.

        Notes:
            - If `correct_title` is provided, the Levenshtein ratio will be calculated between the parsed title and the correct title.
            - If the ratio is below the threshold, a `GarbageTorrent` error will be raised.
            - If no correct title is provided, the Levenshtein ratio will be set to 0.0.

        Example:
            ```python
            from RTN import RTN
            from RTN.models import SettingsModel, DefaultRanking

            settings_model = SettingsModel()
            ranking_model = DefaultRanking()
            rtn = RTN(settings_model, ranking_model)
            torrent = rtn.rank("The Walking Dead S05E03 720p HDTV x264-ASAP[ettv]", "c08a9ee8ce3a5c2c08865e2b05406273cabc97e7")
            assert isinstance(torrent, Torrent)
            assert isinstance(torrent.data, ParsedData)
            assert torrent.fetch
            assert torrent.rank > 0
            assert torrent.lev_ratio > 0.0
            ```
        """
        if not raw_title or not infohash:
            raise ValueError("Both the title and infohash must be provided.")

        if len(infohash) != 40:
            raise GarbageTorrent("The infohash must be a valid SHA-1 hash and 40 characters in length.")

        parsed_data: ParsedData = parse(raw_title) # type: ignore

        lev_ratio = 0.0
        if correct_title:
            aliases = kwargs.get("aliases", {})
            lev_ratio: float = get_lev_ratio(correct_title, parsed_data.parsed_title, self.lev_threshold, aliases)

        fetch: bool = check_fetch(parsed_data, self.settings)
        rank: int = get_rank(parsed_data, self.settings, self.ranking_model)

        if remove_trash:
            if not fetch:
                raise GarbageTorrent(f"'{raw_title}' has been identified as trash based on user settings and will be ignored.")
            if correct_title and lev_ratio < self.lev_threshold:
                raise GarbageTorrent(f"'{raw_title}' does not match the correct title, got ratio of {lev_ratio}")

        if rank < self.settings.options["remove_ranks_under"]:
            raise GarbageTorrent(f"'{raw_title}' does not meet the minimum rank requirement, got rank of {rank}")

        return Torrent(
            infohash=infohash,
            raw_title=raw_title,
            data=parsed_data,
            fetch=fetch,
            rank=rank,
            lev_ratio=lev_ratio
        )

__init__(settings, ranking_model)

Initializes the RTN class with settings and a ranking model.

Parameters:

Name Type Description Default
`settings` SettingsModel

The settings model with user preferences for parsing and ranking torrents.

required
`ranking_model` BaseRankingModel

The model defining the ranking logic and score computation.

required

Raises:

Type Description
ValueError

If settings or a ranking model is not provided.

TypeError

If settings is not an instance of SettingsModel or the ranking model is not an instance of BaseRankingModel.

Example
from RTN import RTN
from RTN.models import SettingsModel, DefaultRanking

settings_model = SettingsModel()
ranking_model = DefaultRanking()
rtn = RTN(settings_model, ranking_model, lev_threshold=0.94)
Source code in RTN/parser.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def __init__(self, settings: SettingsModel, ranking_model: BaseRankingModel):
    """
    Initializes the RTN class with settings and a ranking model.

    Args:
        `settings` (SettingsModel): The settings model with user preferences for parsing and ranking torrents.
        `ranking_model` (BaseRankingModel): The model defining the ranking logic and score computation.

    Raises:
        ValueError: If settings or a ranking model is not provided.
        TypeError: If settings is not an instance of SettingsModel or the ranking model is not an instance of BaseRankingModel.

    Example:
        ```python
        from RTN import RTN
        from RTN.models import SettingsModel, DefaultRanking

        settings_model = SettingsModel()
        ranking_model = DefaultRanking()
        rtn = RTN(settings_model, ranking_model, lev_threshold=0.94)
        ```
    """
    self.settings = settings
    self.ranking_model = ranking_model
    self.lev_threshold = self.settings.options.get("title_similarity", 0.85)

rank(raw_title, infohash, correct_title='', remove_trash=False, **kwargs)

Parses a torrent title, computes its rank, and returns a Torrent object with metadata and ranking.

Parameters:

Name Type Description Default
`raw_title` str

The original title of the torrent to parse.

required
`infohash` str

The SHA-1 hash identifier of the torrent.

required
`correct_title` str

The correct title to compare against for similarity. Defaults to an empty string.

required
`remove_trash` bool

Whether to check for trash patterns and raise an error if found. Defaults to True.

required

Returns:

Name Type Description
Torrent Torrent

A Torrent object with metadata and ranking information.

Raises:

Type Description
ValueError

If the title or infohash is not provided for any torrent.

TypeError

If the title or infohash is not a string.

GarbageTorrent

If the title is identified as trash and should be ignored by the scraper, or invalid SHA-1 infohash is given.

Notes
  • If correct_title is provided, the Levenshtein ratio will be calculated between the parsed title and the correct title.
  • If the ratio is below the threshold, a GarbageTorrent error will be raised.
  • If no correct title is provided, the Levenshtein ratio will be set to 0.0.
Example
from RTN import RTN
from RTN.models import SettingsModel, DefaultRanking

settings_model = SettingsModel()
ranking_model = DefaultRanking()
rtn = RTN(settings_model, ranking_model)
torrent = rtn.rank("The Walking Dead S05E03 720p HDTV x264-ASAP[ettv]", "c08a9ee8ce3a5c2c08865e2b05406273cabc97e7")
assert isinstance(torrent, Torrent)
assert isinstance(torrent.data, ParsedData)
assert torrent.fetch
assert torrent.rank > 0
assert torrent.lev_ratio > 0.0
Source code in RTN/parser.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def rank(self, raw_title: str, infohash: str, correct_title: str = "", remove_trash: bool = False, **kwargs) -> Torrent:
    """
    Parses a torrent title, computes its rank, and returns a Torrent object with metadata and ranking.

    Args:
        `raw_title` (str): The original title of the torrent to parse.
        `infohash` (str): The SHA-1 hash identifier of the torrent.
        `correct_title` (str): The correct title to compare against for similarity. Defaults to an empty string.
        `remove_trash` (bool): Whether to check for trash patterns and raise an error if found. Defaults to True.

    Returns:
        Torrent: A Torrent object with metadata and ranking information.

    Raises:
        ValueError: If the title or infohash is not provided for any torrent.
        TypeError: If the title or infohash is not a string.
        GarbageTorrent: If the title is identified as trash and should be ignored by the scraper, or invalid SHA-1 infohash is given.

    Notes:
        - If `correct_title` is provided, the Levenshtein ratio will be calculated between the parsed title and the correct title.
        - If the ratio is below the threshold, a `GarbageTorrent` error will be raised.
        - If no correct title is provided, the Levenshtein ratio will be set to 0.0.

    Example:
        ```python
        from RTN import RTN
        from RTN.models import SettingsModel, DefaultRanking

        settings_model = SettingsModel()
        ranking_model = DefaultRanking()
        rtn = RTN(settings_model, ranking_model)
        torrent = rtn.rank("The Walking Dead S05E03 720p HDTV x264-ASAP[ettv]", "c08a9ee8ce3a5c2c08865e2b05406273cabc97e7")
        assert isinstance(torrent, Torrent)
        assert isinstance(torrent.data, ParsedData)
        assert torrent.fetch
        assert torrent.rank > 0
        assert torrent.lev_ratio > 0.0
        ```
    """
    if not raw_title or not infohash:
        raise ValueError("Both the title and infohash must be provided.")

    if len(infohash) != 40:
        raise GarbageTorrent("The infohash must be a valid SHA-1 hash and 40 characters in length.")

    parsed_data: ParsedData = parse(raw_title) # type: ignore

    lev_ratio = 0.0
    if correct_title:
        aliases = kwargs.get("aliases", {})
        lev_ratio: float = get_lev_ratio(correct_title, parsed_data.parsed_title, self.lev_threshold, aliases)

    fetch: bool = check_fetch(parsed_data, self.settings)
    rank: int = get_rank(parsed_data, self.settings, self.ranking_model)

    if remove_trash:
        if not fetch:
            raise GarbageTorrent(f"'{raw_title}' has been identified as trash based on user settings and will be ignored.")
        if correct_title and lev_ratio < self.lev_threshold:
            raise GarbageTorrent(f"'{raw_title}' does not match the correct title, got ratio of {lev_ratio}")

    if rank < self.settings.options["remove_ranks_under"]:
        raise GarbageTorrent(f"'{raw_title}' does not meet the minimum rank requirement, got rank of {rank}")

    return Torrent(
        infohash=infohash,
        raw_title=raw_title,
        data=parsed_data,
        fetch=fetch,
        rank=rank,
        lev_ratio=lev_ratio
    )

parse(raw_title, translate_langs=False, json=False)

Parses a torrent title using PTN and enriches it with additional metadata extracted from patterns.

Parameters:

Name Type Description Default
- `raw_title` (str

The original torrent title to parse.

required
- `translate_langs` (bool

Whether to translate the language codes in the parsed title. Defaults to False.

required
- `json` (bool

Whether to return the parsed data as a dictionary. Defaults to False.

required

Returns:

Type Description
ParsedData | Dict[str, Any]

ParsedData: A data model containing the parsed metadata from the torrent title.

Example
parsed_data = parse("Game of Thrones S08E06 1080p WEB-DL DD5.1 H264-GoT")
print(parsed_data.parsed_title) # 'Game of Thrones'
print(parsed_data.normalized_title) # 'game of thrones'
print(parsed_data.type) # 'show'
print(parsed_data.seasons) # [8]
print(parsed_data.episodes) # [6]
print(parsed_data.resolution) # '1080p'
print(parsed_data.audio) # ['DD5.1']
print(parsed_data.codec) # 'H264'
Source code in RTN/parser.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
def parse(raw_title: str, translate_langs: bool = False, json: bool = False) -> ParsedData | Dict[str, Any]:
    """
    Parses a torrent title using PTN and enriches it with additional metadata extracted from patterns.

    Args:
        - `raw_title` (str): The original torrent title to parse.
        - `translate_langs` (bool): Whether to translate the language codes in the parsed title. Defaults to False.
        - `json` (bool): Whether to return the parsed data as a dictionary. Defaults to False.

    Returns:
        `ParsedData`: A data model containing the parsed metadata from the torrent title.

    Example:
        ```python
        parsed_data = parse("Game of Thrones S08E06 1080p WEB-DL DD5.1 H264-GoT")
        print(parsed_data.parsed_title) # 'Game of Thrones'
        print(parsed_data.normalized_title) # 'game of thrones'
        print(parsed_data.type) # 'show'
        print(parsed_data.seasons) # [8]
        print(parsed_data.episodes) # [6]
        print(parsed_data.resolution) # '1080p'
        print(parsed_data.audio) # ['DD5.1']
        print(parsed_data.codec) # 'H264'
        ```
    """
    if not raw_title or not isinstance(raw_title, str):
        raise TypeError("The input title must be a non-empty string.")

    data: Dict[str, Any] = parse_title(raw_title, translate_langs)
    item = ParsedData(
        **data,
        raw_title=raw_title,
        parsed_title=data.get("title", ""),
        normalized_title=normalize_title(data.get("title", "")),
        _3d=data.get("3d", False)
    )

    return item if not json else item.model_json_schema()