When an agent learns using Q-learning, it faces a trade-off between exploration and exploitation - reflection and action.
Exploration involves taking actions to gather new information about the environment, even if those actions may not yield the highest immediate reward.
Exploitation involves taking actions that maximize the expected reward based on current knowledge.
Q-learning balances these two aspects, typically using techniques like epsilon-greedy exploration to ensure the agent explores sufficiently while still favoring high-value actions.