[LeetCode] 819. Most Common Word (Python, Counter)

819. Most Common Word

Given a string paragraph and a string array of the banned words banned, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.

The words in paragraph are case-insensitive and the answer should be returned in lowercase.

Note that words can not contain punctuation symbols.

Example 1:

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.

Example 2:

Input: paragraph = "a.", banned = []
Output: "a"

Constraints:

1 <= paragraph.length <= 1000
paragraph consists of English letters, space ' ', or one of the symbols: "!?',;.".
0 <= banned.length <= 100
1 <= banned[i].length <= 10
banned[i] consists of only lowercase English letters.

819. Most Common Word

주어진 문장(paragraph)과 금지 단어 목록(banned)이 주어질 때,
금지되지 않은 단어 중 가장 많이 등장한 단어를 찾는 문제

문제에서 주어지는 주요 조건은 다음과 같다.

대소문자를 구분하지 않음
문장에는 공백, 마침표, 쉼표, 느낌표 등의 문장부호가 포함됨
단어는 알파벳으로만 구성됨
문장부호는 단어에 인접해 있어도 무시해야 함
정답은 반드시 하나로 보장됨
결과는 소문자로 반환해야 함

✔풀이

import re
from collections import defaultdict

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:

        word_count = defaultdict(int)
        words = re.findall(r'[a-zA-Z]+', paragraph)

        for word in words:
            if word.lower() not in banned:
                word_count[word.lower()] += 1

        return max(word_count, key=word_count.get)

[a-zA-Z]+
→ 영어 알파벳이 1글자 이상 연속된 부분
- 문장부호(, . ! ? ' ;)와 공백 자동 제거
- 결과는 영어 단어 리스트 형태
dictionary의 key 중 value(등장 횟수)가 가장 큰 key 반환
문제에서 정답이 유일함이 보장되므로 추가 처리 불필요

💡Counter 사용 풀이

import re
from collections import Counter

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:

        banned_set = set(banned)
        words = re.findall(r'[a-zA-Z]+', paragraph.lower())

        counter = Counter(word for word in words if word not in banned_set)

        return counter.most_common(1)[0][0]

Counter, set을 사용하면 더 효율적인 풀이가 가능하다고 함
banned List를 set으로 바꾼다 -> not in 연산이 O(1)이 됨
정규표현식으로 단어들만 소문자로 추출해서 words에 저장
banned_set에 있지 않은 단어들을 Counter로 세기
most_common의 정체
등장 횟수가 많은 순서대로 정렬된 결과를 반환하는 메서드
형식
```
counter.most_common(n)
```
- n → 상위 몇 개를 가져올지
- 반환값 → (원소, 횟수) 튜플의 리스트
```
counter.most_common(1)[0][0]
```
단계별 해석
1. most_common(1)
  → [('ball', 2)]
2. [0]
  → ('ball', 2)
3. [0]
  → 'ball'

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"] Output: "ball" Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.

819. Most Common Word

주어진 문장(paragraph)과 금지 단어 목록(banned)이 주어질 때,
금지되지 않은 단어 중 가장 많이 등장한 단어를 찾는 문제

문제에서 주어지는 주요 조건은 다음과 같다.

대소문자를 구분하지 않음

문장에는 공백, 마침표, 쉼표, 느낌표 등의 문장부호가 포함됨

단어는 알파벳으로만 구성됨

문장부호는 단어에 인접해 있어도 무시해야 함

정답은 반드시 하나로 보장됨

결과는 소문자로 반환해야 함

✔풀이

import re from collections import defaultdict class Solution: def mostCommonWord(self, paragraph: str, banned: List[str]) -> str: word_count = defaultdict(int) words = re.findall(r'[a-zA-Z]+', paragraph) for word in words: if word.lower() not in banned: word_count[word.lower()] += 1 return max(word_count, key=word_count.get)

[a-zA-Z]+
→ 영어 알파벳이 1글자 이상 연속된 부분

문장부호(, . ! ? ' ;)와 공백 자동 제거
결과는 영어 단어 리스트 형태

dictionary의 key 중 value(등장 횟수)가 가장 큰 key 반환

문제에서 정답이 유일함이 보장되므로 추가 처리 불필요

💡Counter 사용 풀이

import re from collections import Counter class Solution: def mostCommonWord(self, paragraph: str, banned: List[str]) -> str: banned_set = set(banned) words = re.findall(r'[a-zA-Z]+', paragraph.lower()) counter = Counter(word for word in words if word not in banned_set) return counter.most_common(1)[0][0]

Counter, set을 사용하면 더 효율적인 풀이가 가능하다고 함

banned List를 set으로 바꾼다 -> not in 연산이 O(1)이 됨

정규표현식으로 단어들만 소문자로 추출해서 words에 저장

banned_set에 있지 않은 단어들을 Counter로 세기

most_common의 정체

등장 횟수가 많은 순서대로 정렬된 결과를 반환하는 메서드

형식

counter.most_common(n)

n → 상위 몇 개를 가져올지
반환값 → (원소, 횟수) 튜플의 리스트

counter.most_common(1)[0][0]

단계별 해석

most_common(1)
→ [('ball', 2)]
[0]
→ ('ball', 2)
[0]
→ 'ball'