Conversation
| def _source_to_t2t(self, example): | ||
| example_ = {} | ||
| example_["document_id"] = "" | ||
| example_["text_1_name"] = "" | ||
| example_["text_2_name"] = "" | ||
|
|
||
| text1 = "" | ||
| text1 += "Question ID: " + example["question_id"] + "\n" | ||
| text1 += "Question: " + example["question"] + "\n" | ||
| for article in example["articles"]: | ||
| text1 += "Answer ID: " + article["answer_id"] + "\n" | ||
| text1 += "Answer: " + article["text"] + "\n" | ||
| text1 += "Rating: " + article["rating"] + "\n" | ||
| example_["text_1"] = text1 | ||
|
|
||
| example_["text_2"] = example["summary"] | ||
|
|
||
| return example_ |
There was a problem hiding this comment.
This is the transformation of the source data to fit the t2t schema.
Basically the summarization works like: question + answer -> summarized_answer so for t2t schema I concatenated all interesting values with "\n" for the value of text_1.
An of example page2answer_single_abstractive:
"1_Answer4": {
"summary": "Abetalipoproteimemia, also known as Bassen-Kornzweig syndrome, ... ",
"articles": " Bassen-Kornzweig syndrome Abetalipoproteinemia Acanthocytosis Apolipoprotein B deficiency...",
"question": "abetalipoproteimemia hi, I would like to know if there is any support for those suffering with abetalipoproteinemia? ...",
"question_id": "1",
"rating": "3-Incomplete"
}
where "1_Answer4" is answer_id above and "articles" corresponds to article["text"]
sunnnymskang
left a comment
There was a problem hiding this comment.
@nomisto In the description part, can you add information about subset_id (and mediqa_ans_all implements only source)? Confirmed that all other 8 subset id pass unit tests
|
Hi @sunnnymskang , Sure, I've added a description to the value of _DESCRIPTION and the docstring. |
|
@nomisto Can you remind me why this fits the t2t schema better than question answering? We want to merge this PR asap; it looks mostly ok. |
|
Hi @hakunanatasha , the name of this dataset is a little misleading: It is a summarization task, more specifically an answer summarization task. So the input is question + answer and the task is to generate a summarization of that answer. |
|
@nomisto got it; I'll merge this later today. Sorry for the hold up. I assume since it's a summarization, the text-1/2-name are also blank as there is nothing to update here. |
Closes #427
Dataset contains 8 different subset_id's (different dataset settings), each with a
bigbioandsourceschema.Furthermore there is an subset called
mediqa_ans_allwhich includes all data (articles, sections, URLs of documents, all four different kinds of summaries, ...). I did not implement abigbioschema for the all view as I think this does not make sense here. Since thebigbioschema is missing foralltests fail for subsetmediqa_ans_all.Tests: