Response Correctness Evaluation Voice and Text (Condensed Version)
Claim Splitting and Fact Checking
Version: 1.0 | Prepared by: KCAE |
Date: November 7, 2025 | Approved By: |
Version History
Version | Date | Description of changes | Initials |
v1.0 | Nov.7, 2025 | Condensed Version for Onboarding New DAs | KCAE |
Note: We use the terms question/query/user request interchangeably in this guideline and they all refer to the input from the user.
Step 1: Response Evaluation
The tool may provide multiple candidate answers for each utterance. A candidate response can be a short sentence, a long extended paragraph, or a request for clarification, creative writing, instructions on how to complete a task, a list of products etc. For each candidate response, evaluate the following:
Step 1.1 Is the response a DEFLECTION?
A response is a deflection when the system did not provide an answer to the query, which is usually because of system errors or limitations.
Example of deflection phrases:
"I'm sorry",
"I apologize",
"I am sorry",
"FAILED TO CAPTURE RESPONSE",
"Sorry but",
"I don't have",
"I am unable",
"I'm unable",
"Alexa+ is experiencing an interruption in service",
"Alexa+ system is temporarily unavailable",
"system is temporarily unavailable
Do not label this step if the query was found unintelligible, ambiguous, with harmful intent, or it is not seeking for information.
Step 1.2 Is the response RELEVANT?
A relevant answer should provide information to the user that directly addresses the question being asked. At this step, there’s no need to fact-check the response.
Consider the following factors in assessing the relevance of the response:
Relevance: A relevant answer should provide information that is directly related to the topic or subject matter of the question.
Specificity: A relevant answer should not be too general or vague, but should provide specific information to the user.
Timeliness: The relevance of an answer may be affected by the current time, location or other contextual factors.
Yes, if the system provided a relevant and specific response.
No, if the response failed to address the user’s request. Choose the following for the reason of irrelevancy. More than one label is allowed.
Prompt: why does my cat attack me out of the blue
Response:
A dog may "attack out of the blue" due to a medical issue, fear, stress, or resource guarding, often triggered by subtle signs of discomfort that were missed.
Prompt: why does my cat attack me out of the blue
Response:
Cats can sometimes exhibit sudden aggressive behavior for various reasons.
Not timely, if the query asked for a specific date but the system provided an outdated or different information. However, if there’s no indicated date in the query, and the system provided a relevant, specific, but outdated response, consider it relevant and must be fact-checked. It should be negated in the fact-checking process.
Example of Not timely response (no fact-checking)
Prompt: what day is Mother’s Day celebrated on 2026
Answer Date: November 5, 2025
Answer: Mother’s Day was celebrated on May 11, 2025.
Example of Relevant response (to be fact-checked and negated with timely information)
Prompt: when is Mother’s Day
Answer Date: November 5, 2025
Answer: Mother’s Day will be celebrated on May 11, 2025.
Do not label this step if the query was found unintelligible, ambiguous, with harmful intent, or it is not seeking for information.
If the evaluation “Is the response relevant” was answered No, the process stops here.
Step 1.4 Is the response CORRECT?
A correct answer should be informative and provide valid information to the user that directly addresses the question being asked.
Step 1.2.1 Enhanced Fact-checking
Step 1.2.1.1 Claims Identification
Step 1: We need to identify all claims within the response. A claim is a text segment containing a statement of fact that can be proved or disproved with evidence. A response may consist of zero to many claims.
There can be multiple claims within a sentence. A claim may also span multiple sentences. For example: “All birds fly” is a claim. A piece of text that makes a topic introduction or an overall conclusion is not considered a claim, for example, “Here is a list of birds that cannot fly”. Common knowledge does not need to be identified as a claim (e.g. “Sleep is important to our overall health”) unless they are about something that has been widely discredited or invalidated (e.g. “Everyone needs 8 hrs of sleep a day”).
Additional Guidance: Understanding Recommendations, Opinions, and Non-Claims
Recommendations and opinions reflect personal views or judgments are not factual claims.
These statements depend on subjective factors like reviews, ratings, or personal preferences, which can vary widely.
When annotating, focus only on factual claims about products, places, or people. Do not treat subjective terms like “best,” “worst,” or “most popular” as claims because they depend on personal or source-based criteria.
Checklist for Identifying Claims vs. Non-Claims
You should ask:
Does it depend on objective facts, not personal opinions or preferences?
Does it avoid subjective terms like “best,” “worst,” or “most popular”?
If all answers are yes, treat it as a factual claim.
If any answer is no, it is likely a recommendation or opinion and should not be considered as a claim.
Claim-Splitting (Voice-Text) |
EXAMPLES
| CLAIM-SPLITTING |
Prompt: what lake does chicago border Response:
Chicago is located on the southwestern shore of Lake Michigan, one of the five Great Lakes of North America. This massive freshwater lake spans all 28 miles of Chicago's shoreline, offering beautiful views and recreational opportunities for residents and visitors alike. Lake Michigan plays a vital role in Chicago's economic, social, and recreational life, serving as a scenic backdrop for the city's iconic skyline.
| - Chicago is located on the southwestern shore of Lake Michigan - one of the five Great Lakes of North America. - This massive freshwater lake spans all 28 miles of Chicago's shoreline, - offering beautiful views and recreational opportunities for residents and visitors alike. - Lake Michigan plays a vital role in Chicago's economic, social, and recreational life, - serving as a scenic backdrop for the city's iconic skyline. |
Prompt: what does the word susceptible mean Response:
The word 'susceptible' means easily influenced, affected, or harmed by something. It describes a vulnerability or sensitivity to certain influences, like illnesses or environmental factors.
| - The word 'susceptible' means easily influenced, affected, or harmed by something. - It describes a vulnerability or sensitivity to certain influences, - like illnesses or environmental factors. |
Prompt: what vitamin does apple have Response:
Apples are a nutritious fruit that contains several important vitamins. According to the most recent factual information:\n\n * Vitamin C: 14% of the daily recommended intake\n * Vitamin A: 2% of the daily recommended intake\n* Vitamin B6: 4% of the daily recommended intake\n* Vitamin K: 5% of the daily recommended intake\n\n Did you know that apples are also a good source of dietary fiber and potassium? They're a well-rounded fruit that can contribute to a healthy diet.
| - Vitamin C: 14% of the daily recommended intake - Vitamin A: 2% of the daily recommended intake - Vitamin B6: 4% of the daily recommended intake - Vitamin K: 5% of the daily recommended intake - apples are also a good source of dietary fiber and potassium
|
This hotel is the best in the city. It has a rooftop pool and free breakfast. | Claim: It has a rooftop pool and free breakfast. Non-claim: This hotel is the best in the city |
Prompt: the best player on the packers team Response date: 07-14-2025
Response: Rashan Gary is currently considered the best player on the Green Bay Packers team. He's a standout defensive end who recently made NFL.com's annual ranking of top 100 players.
Prompt: who is the best center fielder of all time Response date: 07-19-2025
Response: Willie Mays is widely considered the greatest center fielder of all time. He played 2,829 games in center field, more than any other player in history.
| Factual Claim: If multiple credible sources (e.g., official records, verified sports statistics, recognized experts) consistently agree on the information, mark it as a factual claim. Example: "Steph Curry holds the record for the most 3-pointers made in a season" is verifiable through official NBA statistics.
Opinion/Recommendation: If the statement reflects a personal view or cannot be objectively measured and could vary from person to person (e.g., “Steph Curry is the best shooter ever”), it’s an opinion. |
Step 1.2.1.2 Claims Verification
Step 2a: In step 2, we need to verify the factual accuracy of each claim. You need to refer to the input context (if available) and query to make your assessment, by verifying the factuality of the claims using trustworthy sources online. There can be claims that are time-sensitive and are true at the date and time of the input context. For each claim identified in step 1, select:
Reminder: Always set your Google default location to US.
Go to Google page >> click Setting >> Search Setting >> Other Setting >> Language & Region >> Results region >> Choose United States.
Cases requiring flexible judgement
Some claims may not have a single “correct” value but still fall within an acceptable range. In these cases, a claim should be marked as Correct if it:
Accurately reflects the state of events on the same day, or
Falls within a reasonable range according to credible sources.
This applies to the following types of claims:
Dynamic Financial Indicators: These values can fluctuate rapidly and are often time-specific. Examples include:
Stock prices
Individual net worth
Exchange rates and interest rates
Real-time rates for services (e.g., flights, Ubers, hotels)
Market capitalization of companies
Live Events: These should be considered correct if the response matches the event status at any point on the referenced day. Examples include:
Sports scores
Election results
Poll ratings (e.g., presidential approval ratings, election polling, TV show viewer ratings, etc)
Measurements Based on Estimates: Such values vary by size, quality, preparation, or source and should be judged with reasonable flexibility. Examples include:
Nutrient content in food
Price of rare items (e.g., minerals, rare coins, exotic cars, fine art)
Construction costs
Time to complete a task (e.g., “It takes 2 hours to hike this trail”)
Business Attributes: For details such as phone numbers, opening hours, accessibility, menu items/pricing, or accepted payment methods, fact-check using the sources below in the following order. Mark the response as correct if:
The attribute matches the business’s official website, or
If no official website is available, the attribute matches any one of the next three sources
Relative Geographic Location: Do not penalize minor directional inaccuracies in responses (e.g., “north” instead of “northeast”) if the response is still broadly accurate.
Table 8: Cases requiring flexible judgement |
|---|
Type of Claims | Example Queries | Example Responses | What sources say | Rationale |
Dynamic financial indicators | what is the value of one bitcoin (on May 23 2025) | One Bitcoin is currently worth $108,800, which is down 2.4% over the last 24 hours. | $106,841.30 - $111,798.91 | Claims are between the high and low for the day, therefore should be graded as correct |
| price of solana crypto(on May 23 2025) | The current price of Solana is $180.90, up less than 1% over the last 24 hours. | $173.57-$187.28 |
|
| who is the wealthiest man on earth (as ofJune 8 2025) | Elon Musk is currently the wealthiest person in the world, with a net worth of $342 billion. He's the CEO of Tesla and SpaceX. | Musk remains the richest person on the planet with an estimated total net worth of $342 billion as of Sunday, June 8.
|
|
| what is the price of microsoft (as of June 10 2025) | Microsoft's stock is currently trading at $468.47 on the NASDAQ, down 0.9% since the previous close. | $466.96 - $472.80
|
|
Measurements based on estimates | does butter have cholesterol | Yes, butter does contain cholesterol. According to recent information, one pat of butter has about 10.8 milligrams of cholesterol. | 10.75mg for 1 pat of butter 11mg for 1 pat of butter
| Claims are within the range of answers on the internet. Therefore should be graded as correct |
| how much was a pack of cigarettes in 1984 | In 1984, the average price of a pack of cigarettes was about 80 cents. This was significantly lower than today's prices, which average around $6.16 per pack. | 91 cents a pack in 1984 72 cents a pack in 1984 $1.19 a pack in 1984 |
|
| what is a healthy weight for a 50 year old woman 5'8“ | Based on the information available, a healthy weight range for a 50-year-old woman who is 5'8\" tall is typically between 125 and 164 pounds. This range is based on a moderate body mass index (BMI) of 18.5 to 24.9, which is generally considered healthy for most adults. | 126 - 154 lbs 125–164 lbs |
|
| how much does it cost to fly air force one | Based on recent reports, the cost to fly Air Force One ranges from about $177,000 to $200,000 per hour. This includes fuel, crew, and maintenance expenses. | about $177,843 per flight hour in 2022 about $200,000 per hour in 2025 |
|
| how many calories does a red apple have | A medium red apple typically contains about 115 calories. This can vary slightly depending on the specific type and size of the apple. | A medium red apple ranges from 70 to 125 calories according to various sources. |
|
Live events | What’s Trump’s approval rating (As of June 14 2025) | President Trump's approval rating is 46.7% as of yesterday, according to RealClearPolitics. This represents a slight increase of 0.8% over the last month. | The claim is accurate as per source mentioned. Other sources give slightly different numbers around the same figure | If the state represents in question was true during the particular day, the response should be graded as correct |
| Who won the US opens (June 12 2025) | The 2025 U.S. Open is currently underway. J.J. Spaun is leading at 4 under par, with Si Woo Kim, Gary Woodland, and Thomas Detry tied for second at 2 under par. | Question potentially asked while the game was ongoing and the response reflected the score |
|
Local attributes for businesses | northern tier library opens when | The Northern Tier Library opens at 9:00 AM on Tuesdays. | Google Maps gives 9AM-7PM on Tuesdays Library website also gives 9AM-7PM on Tuesday This (outdated, 2020)news site gives 10AM-6PM
| Should be graded as correct as it aligns with Website |
| What time does Goat Tree close? | Goat tree in Santa Barbara is open until 5PM on Thursdays | Website gives dine-in closes at 5PM but Grab and Go closes at 8PM Google Maps gives 5PM Yelp gives 5PM TripAdvisor gives 7.30PM OpenTable gives 7.30PM
| Should be graded as correct (any answer that gives 5PM or 8PM as a response should be correct) |
| Contact number for Goat Tree restaurant | 805-882-0137 | Website, Google and Yelp give 805-882-0137 TripAdvisor gives 805-882-0100 | Should be graded as correct (TripAdvisor number should be considered incorrect) |
Relative geographic location | Where in the world is Ukraine? | Ukraine is located in Eastern Europe. It's bordered by Russia to the east, Belarus to the north, Poland, Slovakia, and Hungary to the west, and Romania, Moldova, and the Black Sea to the south” | Romania and Moldova are mostly to the southwest and south-southwest of Ukraine, not strictly directly south. | Should be graded as correct. The directional framing still gives a useful and roughly accurate geographic orientation. |
| where is the dead sea | The Dead Sea is located in southwestern Asia, bordered by Jordan to the east and Israel and the West Bank to the west. It's a landlocked salt lake, known for being the lowest point on Earth's surface, at 439.78 meters below sea level. | Israel is more to the south-west than the west, but it would not be considered wrong to have this statement factually wrong. | Should be graded as correct. The directional framing still gives a useful and roughly accurate geographic orientation. |
| Where is the Himalayas mountain range? | The Himalayas stretch across southern Asia, running along the northern border of India and separating it from China and Nepal. | Nepal is mostly within the Himalayas rather than just separated by them. | Should be graded as correct. The directional framing still gives a useful and roughly accurate geographic orientation. |
| Where is the Philippines located? | The Philippines is in Southeast Asia, east of Vietnam and south of Taiwan. | It's more southeast of Taiwan than directly south. | Should be graded as correct. The directional framing still gives a useful and roughly accurate geographic orientation. |
Additional Fact-checking Examples and Reminders |
Feedback | Category | Examples |
If the response mentions a specific time with corresponding values, points, and percentages, use the Yahoo Finance day's range for the stock indices. | Business / Finance | Prompt: what is the stock price of tesla Date of Query: 2025-07-02 Response: As of 8:48 AM today, Tesla's stock is trading at $313.68 on the NASDAQ, up 4.3% since the previous close. This price was observed at 9:07 AM Pacific Daylight Time on Wednesday, July 2, 2025.
Claim: As of 8:48 AM today, Tesla's stock is trading at $313.68 on the NASDAQ
Evidence on Yahoo Finance: https://ca.finance.yahoo.com/quote/TSLA/history/
NasdaqGS - Nasdaq Real Time Price USD Tesla, Inc. (TSLA)
Jul 2, 2025 High 316.83 Low 303.82 Close 315.65
Jul 1, 2025 Close 300.71
Comment: Factually correct. $313.68 is within the day range of $303.82 - $316.83.
Claim: up 4.3% since the previous close
Computation: [(Claim - Prev. Closing Price) / Prev. Closing Price] x 100%
313.68 - 300.71 = 12.97 ÷ 300.71 = 0.04313125602740181570283661999933 × 100 = 4.3131256027401815702836619999335 or up by 4.3%
Comment: The percentage change is factually correct.
Note: If the points difference or percentage change is incorrect, include the correct computed value in the evidence section. Do not include the computation in the evidence.
|
Mark the claims as not timely if the tenses do not align with the query date.
| Structure of the Response | Prompt: how many more seasons of the simpsons will there be Response date: 2025-03-31
Response: The Simpsons has been renewed for seasons 35 and 36, which will air through the 2024-25 season. While there's no official word on further renewals, the show remains popular and could continue beyond that.
Note: At the query date, Season 35 was already concluded while Season 36 was ongoing, so it should be marked as not timely.
|
As long as the response is relevant, fact check. Right deflection - mark it as correct If wrong - not relevant and not specific Goal - factuality of the responses - limitations of the model (privacy) should not negatively impact the response
| Deflections | a. Prompt: are cheerios good Response date: 2025-07-12
Response: I apologize, but I'm currently unable to access the information about Cheerios. Is there anything else I can help you with?
b. Prompt: news on president trump Response date: 2025-07-14
Response: I'm sorry, I can't look up real-time news right now. Is there something specific about President Trump you'd like to know?
a-b.Relevant - No Not relevant and not specific - No fact check needed
c. Prompt: is there a dodgers game today Response date: 2025-07-11
Response: I don't have information about the Dodger’s game today..
d. Prompt: has p diddy been sentenced Response date: 2025-07-11
Response: I don't have current information about P Diddy's legal status..
Relevant - Yes If there is available information, provide the answer to the prompt, make sure the evidence is still within the date of query. Marked the response as incorrect, major issue, core fact incorrect. If no information can be found on the web, treat the deflection as correct.
e. Prompt: how much does a cane corso puppy cost
Response: I apologize, but I'm currently unable to retrieve the most up-to-date information on Cane Corso puppy prices. Prices can vary widely depending on factors like breeder reputation, location, and the puppy's pedigree. Generally, Cane Corso puppies from reputable breeders can be quite expensive due to their popularity and the costs associated with responsible breeding practices.\n\n Here's some general information about Cane Corso puppies:\n\n 1. **Price Range**: Typically, Cane Corso puppies can cost anywhere from $1,000 to $4,000 or more.\n\n 2. **Factors Affecting Price**:\n - Bloodline and pedigree\n - Breeder reputation\n - Geographic location\n - Color and physical characteristics\n - Health certifications and genetic testing\n\n 3. **Additional Costs**: Remember to factor in ongoing expenses such as:\n - Food (Cane Corsos are large dogs and eat a lot)\n - Veterinary care\n - Training\n - Supplies (bed, crate, toys, etc.)\n\n 4. **Adoption Alternative**: Consider adopting from a Cane Corso rescue organization, which can be less expensive and gives a home to a dog in need.\n\n 5. **Research Thoroughly**: Due to their size and temperament, Cane Corsos require experienced owners. Ensure you're prepared for the commitment before purchasing.\n\n Would you like more information about Cane Corsos in general, or are you interested in specific aspects of owning this breed?
Relevant - Yes Exclude the deflection part of the response and claim split the valid claims Fact check as is the valid claims
|
Provide the latest information to negate outdated claims. | Others | Prompt: where is st lawrence university Answer Date: 3/27/25
Response: St. Lawrence University is located in Canton, New York. It's a private liberal arts college founded in 1856, situated on a 1,100-acre campus in a rural setting. The university offers over 40 majors across various disciplines and has an enrollment of around 2,250 students.
Claim: situated on a 1,100-acre campus in a rural setting. Evidence used: It has a total undergraduate enrollment of 2,060 (fall 2023), and the campus size is 1,100 acres.
Available information: https://www.stlawu.edu/offices/institutional-research/slu-quick-facts
Statistics for Fall 2024 as of October 1 Student Enrollments Headcount Overall Enrollment 1,991 |
Step 2b: (skip this step and mark it as null or None if your response to the claim in step 2a was “Correct”). Assess the magnitude of the claim inaccuracy if the claim was labeled as incorrect. Assess the magnitude of the fact omission if the claim was labeled as partially correct. Assess the significance of the claim if the claim was labeled as inconclusive.
Minor if most readers would not notice the error, find it jarring or deem it significant. If printed in a newspaper, the newspaper may not need to print a correction.
Major if most readers knowledgeable in the space would likely recognize the error. If printed in a newspaper, the newspaper would have to print a correction or retraction to maintain its reputation.
Inconclusive if you found diverse information online that both support and not support the claim, which lead you to not be certain if the claim is true or false
Step 1.2.1.3 Core Answer Identification
Step 3a: In step 3, we need to identify all the core answers among the claims in the response. A core answer is the main idea or the defining aspect that addresses the query. There can be multiple claims labeled as core answers. For each claim, select:
Example:
Prompt: how old is elton john
Response Date: 11-05-2025
Elton John is 78 years old. He was born on March 25, 1947.
> The core claim is - Elton John is 78 years old
Step 1.2.1.4 Reason for the Incorrectness
Step 4a: In step 4, we need to label the response on what grounding does it make it incorrect based on all the false claims found. One or more labels can be used on this part. After fact-checking the whole response, select the following reasons if it was found incorrect:
Core fact incorrect - if the main answer to the query is false
Additional facts incorrect - if the additional information to the query or topic is false
Not timely - if the response contains outdated information
Not relevant - if the response has claims with information not related to the main topic
Not specific - if the response has provided relevant information but does not specifically answer the query.
Step 1.2.1.5 Link to source of information
Step 5a: For each claim marked as Correct or Incorrect, include the main source link used to support your decision.
Reminder: Wikipedia can be used as evidence, but do not provide information from unreliable sources such as Medium, Reddit, Quora, blogs, social media sites, AI tools / LLMs, etc. Please refer to the main guidelines for the complete list. Response Correctness Evaluation: Voice and Text.docx
Correctness Evaluation Voice and Text (Condensed Version)
Claim Splitting and Fact Checking
Version: 1.0 | Prepared by: KCAE |
Date: November 7, 2025 | Approved By: |
Version History
Version | Date | Description of changes | Initials |
v1.0 | Nov.7, 2025 | Condensed Version for Onboarding New DAs | KCAE |
Note: We use the terms question/query/user request interchangeably in this guideline and they all refer to the input from the user.
Step 1: Response Evaluation
The tool may provide multiple candidate answers for each utterance. A candidate response can be a short sentence, a long extended paragraph, or a request for clarification, creative writing, instructions on how to complete a task, a list of products etc. For each candidate response, evaluate the following:
Step 1.1 Is the response a DEFLECTION?
A response is a deflection when the system did not provide an answer to the query, which is usually because of system errors or limitations.
Example of deflection phrases:
"I'm sorry",
"I apologize",
"I am sorry",
"FAILED TO CAPTURE RESPONSE",
"Sorry but",
"I don't have",
"I am unable",
"I'm unable",
"Alexa+ is experiencing an interruption in service",
"Alexa+ system is temporarily unavailable",
"system is temporarily unavailable
Do not label this step if the query was found unintelligible, ambiguous, with harmful intent, or it is not seeking for information.
Step 1.2 Is the response RELEVANT?
A relevant answer should provide information to the user that directly addresses the question being asked. At this step, there’s no need to fact-check the response.
Consider the following factors in assessing the relevance of the response:
Relevance: A relevant answer should provide information that is directly related to the topic or subject matter of the question.
Specificity: A relevant answer should not be too general or vague, but should provide specific information to the user.
Timeliness: The relevance of an answer may be affected by the current time, location or other contextual factors.
Yes, if the system provided a relevant and specific response.
No, if the response failed to address the user’s request. Choose the following for the reason of irrelevancy. More than one label is allowed.
Prompt: why does my cat attack me out of the blue
Response:
A dog may "attack out of the blue" due to a medical issue, fear, stress, or resource guarding, often triggered by subtle signs of discomfort that were missed.
Prompt: why does my cat attack me out of the blue
Response:
Cats can sometimes exhibit sudden aggressive behavior for various reasons.
Not timely, if the query asked for a specific date but the system provided an outdated or different information. However, if there’s no indicated date in the query, and the system provided a relevant, specific, but outdated response, consider it relevant and must be fact-checked. It should be negated in the fact-checking process.
Example of Not timely response (no fact-checking)
Prompt: what day is Mother’s Day celebrated on 2026
Answer Date: November 5, 2025
Answer: Mother’s Day was celebrated on May 11, 2025.
Example of Relevant response (to be fact-checked and negated with timely information)
Prompt: when is Mother’s Day
Answer Date: November 5, 2025
Answer: Mother’s Day will be celebrated on May 11, 2025.
Do not label this step if the query was found unintelligible, ambiguous, with harmful intent, or it is not seeking for information.
If the evaluation “Is the response relevant” was answered No, the process stops here.
Step 1.4 Is the response CORRECT?
A correct answer should be informative and provide valid information to the user that directly addresses the question being asked.
Step 1.2.1 Enhanced Fact-checking
Step 1.2.1.1 Claims Identification
Step 1: We need to identify all claims within the response. A claim is a text segment containing a statement of fact that can be proved or disproved with evidence. A response may consist of zero to many claims.
There can be multiple claims within a sentence. A claim may also span multiple sentences. For example: “All birds fly” is a claim. A piece of text that makes a topic introduction or an overall conclusion is not considered a claim, for example, “Here is a list of birds that cannot fly”. Common knowledge does not need to be identified as a claim (e.g. “Sleep is important to our overall health”) unless they are about something that has been widely discredited or invalidated (e.g. “Everyone needs 8 hrs of sleep a day”).
Additional Guidance: Understanding Recommendations, Opinions, and Non-Claims
Recommendations and opinions reflect personal views or judgments are not factual claims.
These statements depend on subjective factors like reviews, ratings, or personal preferences, which can vary widely.
When annotating, focus only on factual claims about products, places, or people. Do not treat subjective terms like “best,” “worst,” or “most popular” as claims because they depend on personal or source-based criteria.
Checklist for Identifying Claims vs. Non-Claims
You should ask:
Does it depend on objective facts, not personal opinions or preferences?
Does it avoid subjective terms like “best,” “worst,” or “most popular”?
If all answers are yes, treat it as a factual claim.
If any answer is no, it is likely a recommendation or opinion and should not be considered as a claim.
Claim-Splitting (Voice-Text) |
EXAMPLES
| CLAIM-SPLITTING |
Prompt: what lake does chicago border Response:
Chicago is located on the southwestern shore of Lake Michigan, one of the five Great Lakes of North America. This massive freshwater lake spans all 28 miles of Chicago's shoreline, offering beautiful views and recreational opportunities for residents and visitors alike. Lake Michigan plays a vital role in Chicago's economic, social, and recreational life, serving as a scenic backdrop for the city's iconic skyline.
| - Chicago is located on the southwestern shore of Lake Michigan - one of the five Great Lakes of North America. - This massive freshwater lake spans all 28 miles of Chicago's shoreline, - offering beautiful views and recreational opportunities for residents and visitors alike. - Lake Michigan plays a vital role in Chicago's economic, social, and recreational life, - serving as a scenic backdrop for the city's iconic skyline. |
Prompt: what does the word susceptible mean Response:
The word 'susceptible' means easily influenced, affected, or harmed by something. It describes a vulnerability or sensitivity to certain influences, like illnesses or environmental factors.
| - The word 'susceptible' means easily influenced, affected, or harmed by something. - It describes a vulnerability or sensitivity to certain influences, - like illnesses or environmental factors. |
Prompt: what vitamin does apple have Response:
Apples are a nutritious fruit that contains several important vitamins. According to the most recent factual information:\n\n * Vitamin C: 14% of the daily recommended intake\n * Vitamin A: 2% of the daily recommended intake\n* Vitamin B6: 4% of the daily recommended intake\n* Vitamin K: 5% of the daily recommended intake\n\n Did you know that apples are also a good source of dietary fiber and potassium? They're a well-rounded fruit that can contribute to a healthy diet.
| - Vitamin C: 14% of the daily recommended intake - Vitamin A: 2% of the daily recommended intake - Vitamin B6: 4% of the daily recommended intake - Vitamin K: 5% of the daily recommended intake - apples are also a good source of dietary fiber and potassium
|
This hotel is the best in the city. It has a rooftop pool and free breakfast. | Claim: It has a rooftop pool and free breakfast. Non-claim: This hotel is the best in the city |
Prompt: the best player on the packers team Response date: 07-14-2025
Response: Rashan Gary is currently considered the best player on the Green Bay Packers team. He's a standout defensive end who recently made NFL.com's annual ranking of top 100 players.
Prompt: who is the best center fielder of all time Response date: 07-19-2025
Response: Willie Mays is widely considered the greatest center fielder of all time. He played 2,829 games in center field, more than any other player in history.
| Factual Claim: If multiple credible sources (e.g., official records, verified sports statistics, recognized experts) consistently agree on the information, mark it as a factual claim. Example: "Steph Curry holds the record for the most 3-pointers made in a season" is verifiable through official NBA statistics.
Opinion/Recommendation: If the statement reflects a personal view or cannot be objectively measured and could vary from person to person (e.g., “Steph Curry is the best shooter ever”), it’s an opinion. |
Step 1.2.1.2 Claims Verification
Step 2a: In step 2, we need to verify the factual accuracy of each claim. You need to refer to the input context (if available) and query to make your assessment, by verifying the factuality of the claims using trustworthy sources online. There can be claims that are time-sensitive and are true at the date and time of the input context. For each claim identified in step 1, select:
Reminder: Always set your Google default location to US.
Go to Google page >> click Setting >> Search Setting >> Other Setting >> Language & Region >> Results region >> Choose United States.
Cases requiring flexible judgement
Some claims may not have a single “correct” value but still fall within an acceptable range. In these cases, a claim should be marked as Correct if it:
Accurately reflects the state of events on the same day, or
Falls within a reasonable range according to credible sources.
This applies to the following types of claims:
Dynamic Financial Indicators: These values can fluctuate rapidly and are often time-specific. Examples include:
Stock prices
Individual net worth
Exchange rates and interest rates
Real-time rates for services (e.g., flights, Ubers, hotels)
Market capitalization of companies
Live Events: These should be considered correct if the response matches the event status at any point on the referenced day. Examples include:
Sports scores
Election results
Poll ratings (e.g., presidential approval ratings, election polling, TV show viewer ratings, etc)
Measurements Based on Estimates: Such values vary by size, quality, preparation, or source and should be judged with reasonable flexibility. Examples include:
Nutrient content in food
Price of rare items (e.g., minerals, rare coins, exotic cars, fine art)
Construction costs
Time to complete a task (e.g., “It takes 2 hours to hike this trail”)
Business Attributes: For details such as phone numbers, opening hours, accessibility, menu items/pricing, or accepted payment methods, fact-check using the sources below in the following order. Mark the response as correct if:
The attribute matches the business’s official website, or
If no official website is available, the attribute matches any one of the next three sources
Relative Geographic Location: Do not penalize minor directional inaccuracies in responses (e.g., “north” instead of “northeast”) if the response is still broadly accurate.
Table 8: Cases requiring flexible judgement |
|---|
Type of Claims | Example Queries | Example Responses | What sources say | Rationale |
Dynamic financial indicators | what is the value of one bitcoin (on May 23 2025) | One Bitcoin is currently worth $108,800, which is down 2.4% over the last 24 hours. | $106,841.30 - $111,798.91 | Claims are between the high and low for the day, therefore should be graded as correct |
| price of solana crypto(on May 23 2025) | The current price of Solana is $180.90, up less than 1% over the last 24 hours. | $173.57-$187.28 |
|
| who is the wealthiest man on earth (as ofJune 8 2025) | Elon Musk is currently the wealthiest person in the world, with a net worth of $342 billion. He's the CEO of Tesla and SpaceX. | Musk remains the richest person on the planet with an estimated total net worth of $342 billion as of Sunday, June 8.
|
|
| what is the price of microsoft (as of June 10 2025) | Microsoft's stock is currently trading at $468.47 on the NASDAQ, down 0.9% since the previous close. | $466.96 - $472.80
|
|
Measurements based on estimates | does butter have cholesterol | Yes, butter does contain cholesterol. According to recent information, one pat of butter has about 10.8 milligrams of cholesterol. | 10.75mg for 1 pat of butter 11mg for 1 pat of butter
| Claims are within the range of answers on the internet. Therefore should be graded as correct |
| how much was a pack of cigarettes in 1984 | In 1984, the average price of a pack of cigarettes was about 80 cents. This was significantly lower than today's prices, which average around $6.16 per pack. | 91 cents a pack in 1984 72 cents a pack in 1984 $1.19 a pack in 1984 |
|
| what is a healthy weight for a 50 year old woman 5'8“ | Based on the information available, a healthy weight range for a 50-year-old woman who is 5'8\" tall is typically between 125 and 164 pounds. This range is based on a moderate body mass index (BMI) of 18.5 to 24.9, which is generally considered healthy for most adults. | 126 - 154 lbs 125–164 lbs |
|
| how much does it cost to fly air force one | Based on recent reports, the cost to fly Air Force One ranges from about $177,000 to $200,000 per hour. This includes fuel, crew, and maintenance expenses. | about $177,843 per flight hour in 2022 about $200,000 per hour in 2025 |
|
| how many calories does a red apple have | A medium red apple typically contains about 115 calories. This can vary slightly depending on the specific type and size of the apple. | A medium red apple ranges from 70 to 125 calories according to various sources. |
|
Live events | What’s Trump’s approval rating (As of June 14 2025) | President Trump's approval rating is 46.7% as of yesterday, according to RealClearPolitics. This represents a slight increase of 0.8% over the last month. | The claim is accurate as per source mentioned. Other sources give slightly different numbers around the same figure | If the state represents in question was true during the particular day, the response should be graded as correct |
| Who won the US opens (June 12 2025) | The 2025 U.S. Open is currently underway. J.J. Spaun is leading at 4 under par, with Si Woo Kim, Gary Woodland, and Thomas Detry tied for second at 2 under par. | Question potentially asked while the game was ongoing and the response reflected the score |
|
Local attributes for businesses | northern tier library opens when | The Northern Tier Library opens at 9:00 AM on Tuesdays. | Google Maps gives 9AM-7PM on Tuesdays Library website also gives 9AM-7PM on Tuesday This (outdated, 2020)news site gives 10AM-6PM
| Should be graded as correct as it aligns with Website |
| What time does Goat Tree close? | Goat tree in Santa Barbara is open until 5PM on Thursdays | Website gives dine-in closes at 5PM but Grab and Go closes at 8PM Google Maps gives 5PM Yelp gives 5PM TripAdvisor gives 7.30PM OpenTable gives 7.30PM
| Should be graded as correct (any answer that gives 5PM or 8PM as a response should be correct) |
| Contact number for Goat Tree restaurant | 805-882-0137 | Website, Google and Yelp give 805-882-0137 TripAdvisor gives 805-882-0100 | Should be graded as correct (TripAdvisor number should be considered incorrect) |
Relative geographic location | Where in the world is Ukraine? | Ukraine is located in Eastern Europe. It's bordered by Russia to the east, Belarus to the north, Poland, Slovakia, and Hungary to the west, and Romania, Moldova, and the Black Sea to the south” | Romania and Moldova are mostly to the southwest and south-southwest of Ukraine, not strictly directly south. | Should be graded as correct. The directional framing still gives a useful and roughly accurate geographic orientation. |
| where is the dead sea | The Dead Sea is located in southwestern Asia, bordered by Jordan to the east and Israel and the West Bank to the west. It's a landlocked salt lake, known for being the lowest point on Earth's surface, at 439.78 meters below sea level. | Israel is more to the south-west than the west, but it would not be considered wrong to have this statement factually wrong. | Should be graded as correct. The directional framing still gives a useful and roughly accurate geographic orientation. |
| Where is the Himalayas mountain range? | The Himalayas stretch across southern Asia, running along the northern border of India and separating it from China and Nepal. | Nepal is mostly within the Himalayas rather than just separated by them. | Should be graded as correct. The directional framing still gives a useful and roughly accurate geographic orientation. |
| Where is the Philippines located? | The Philippines is in Southeast Asia, east of Vietnam and south of Taiwan. | It's more southeast of Taiwan than directly south. | Should be graded as correct. The directional framing still gives a useful and roughly accurate geographic orientation. |
Additional Fact-checking Examples and Reminders |
Feedback | Category | Examples |
If the response mentions a specific time with corresponding values, points, and percentages, use the Yahoo Finance day's range for the stock indices. | Business / Finance | Prompt: what is the stock price of tesla Date of Query: 2025-07-02 Response: As of 8:48 AM today, Tesla's stock is trading at $313.68 on the NASDAQ, up 4.3% since the previous close. This price was observed at 9:07 AM Pacific Daylight Time on Wednesday, July 2, 2025.
Claim: As of 8:48 AM today, Tesla's stock is trading at $313.68 on the NASDAQ
Evidence on Yahoo Finance: https://ca.finance.yahoo.com/quote/TSLA/history/
NasdaqGS - Nasdaq Real Time Price USD Tesla, Inc. (TSLA)
Jul 2, 2025 High 316.83 Low 303.82 Close 315.65
Jul 1, 2025 Close 300.71
Comment: Factually correct. $313.68 is within the day range of $303.82 - $316.83.
Claim: up 4.3% since the previous close
Computation: [(Claim - Prev. Closing Price) / Prev. Closing Price] x 100%
313.68 - 300.71 = 12.97 ÷ 300.71 = 0.04313125602740181570283661999933 × 100 = 4.3131256027401815702836619999335 or up by 4.3%
Comment: The percentage change is factually correct.
Note: If the points difference or percentage change is incorrect, include the correct computed value in the evidence section. Do not include the computation in the evidence.
|
Mark the claims as not timely if the tenses do not align with the query date.
| Structure of the Response | Prompt: how many more seasons of the simpsons will there be Response date: 2025-03-31
Response: The Simpsons has been renewed for seasons 35 and 36, which will air through the 2024-25 season. While there's no official word on further renewals, the show remains popular and could continue beyond that.
Note: At the query date, Season 35 was already concluded while Season 36 was ongoing, so it should be marked as not timely.
|
As long as the response is relevant, fact check. Right deflection - mark it as correct If wrong - not relevant and not specific Goal - factuality of the responses - limitations of the model (privacy) should not negatively impact the response
| Deflections | a. Prompt: are cheerios good Response date: 2025-07-12
Response: I apologize, but I'm currently unable to access the information about Cheerios. Is there anything else I can help you with?
b. Prompt: news on president trump Response date: 2025-07-14
Response: I'm sorry, I can't look up real-time news right now. Is there something specific about President Trump you'd like to know?
a-b.Relevant - No Not relevant and not specific - No fact check needed
c. Prompt: is there a dodgers game today Response date: 2025-07-11
Response: I don't have information about the Dodger’s game today..
d. Prompt: has p diddy been sentenced Response date: 2025-07-11
Response: I don't have current information about P Diddy's legal status..
Relevant - Yes If there is available information, provide the answer to the prompt, make sure the evidence is still within the date of query. Marked the response as incorrect, major issue, core fact incorrect. If no information can be found on the web, treat the deflection as correct.
e. Prompt: how much does a cane corso puppy cost
Response: I apologize, but I'm currently unable to retrieve the most up-to-date information on Cane Corso puppy prices. Prices can vary widely depending on factors like breeder reputation, location, and the puppy's pedigree. Generally, Cane Corso puppies from reputable breeders can be quite expensive due to their popularity and the costs associated with responsible breeding practices.\n\n Here's some general information about Cane Corso puppies:\n\n 1. **Price Range**: Typically, Cane Corso puppies can cost anywhere from $1,000 to $4,000 or more.\n\n 2. **Factors Affecting Price**:\n - Bloodline and pedigree\n - Breeder reputation\n - Geographic location\n - Color and physical characteristics\n - Health certifications and genetic testing\n\n 3. **Additional Costs**: Remember to factor in ongoing expenses such as:\n - Food (Cane Corsos are large dogs and eat a lot)\n - Veterinary care\n - Training\n - Supplies (bed, crate, toys, etc.)\n\n 4. **Adoption Alternative**: Consider adopting from a Cane Corso rescue organization, which can be less expensive and gives a home to a dog in need.\n\n 5. **Research Thoroughly**: Due to their size and temperament, Cane Corsos require experienced owners. Ensure you're prepared for the commitment before purchasing.\n\n Would you like more information about Cane Corsos in general, or are you interested in specific aspects of owning this breed?
Relevant - Yes Exclude the deflection part of the response and claim split the valid claims Fact check as is the valid claims
|
Provide the latest information to negate outdated claims. | Others | Prompt: where is st lawrence university Answer Date: 3/27/25
Response: St. Lawrence University is located in Canton, New York. It's a private liberal arts college founded in 1856, situated on a 1,100-acre campus in a rural setting. The university offers over 40 majors across various disciplines and has an enrollment of around 2,250 students.
Claim: situated on a 1,100-acre campus in a rural setting. Evidence used: It has a total undergraduate enrollment of 2,060 (fall 2023), and the campus size is 1,100 acres.
Available information: https://www.stlawu.edu/offices/institutional-research/slu-quick-facts
Statistics for Fall 2024 as of October 1 Student Enrollments Headcount Overall Enrollment 1,991 |
Step 2b: (skip this step and mark it as null or None if your response to the claim in step 2a was “Correct”). Assess the magnitude of the claim inaccuracy if the claim was labeled as incorrect. Assess the magnitude of the fact omission if the claim was labeled as partially correct. Assess the significance of the claim if the claim was labeled as inconclusive.
Minor if most readers would not notice the error, find it jarring or deem it significant. If printed in a newspaper, the newspaper may not need to print a correction.
Major if most readers knowledgeable in the space would likely recognize the error. If printed in a newspaper, the newspaper would have to print a correction or retraction to maintain its reputation.
Inconclusive if you found diverse information online that both support and not support the claim, which lead you to not be certain if the claim is true or false
Step 1.2.1.3 Core Answer Identification
Step 3a: In step 3, we need to identify all the core answers among the claims in the response. A core answer is the main idea or the defining aspect that addresses the query. There can be multiple claims labeled as core answers. For each claim, select:
Example:
Prompt: how old is elton john
Response Date: 11-05-2025
Elton John is 78 years old. He was born on March 25, 1947.
> The core claim is - Elton John is 78 years old
Step 1.2.1.4 Reason for the Incorrectness
Step 4a: In step 4, we need to label the response on what grounding does it make it incorrect based on all the false claims found. One or more labels can be used on this part. After fact-checking the whole response, select the following reasons if it was found incorrect:
Core fact incorrect - if the main answer to the query is false
Additional facts incorrect - if the additional information to the query or topic is false
Not timely - if the response contains outdated information
Not relevant - if the response has claims with information not related to the main topic
Not specific - if the response has provided relevant information but does not specifically answer the query.
Step 1.2.1.5 Link to source of information
Step 5a: For each claim marked as Correct or Incorrect, include the main source link used to support your decision.
Reminder: Wikipedia can be used as evidence, but do not provide information from unreliable sources such as Medium, Reddit, Quora, blogs, social media sites, AI tools / LLMs, etc. Please refer to the main guidelines for the complete list. Response Correctness Evaluation: Voice and Text.docx
Post a Comment