diff --git "a/README.md" "b/README.md"
new file mode 100644--- /dev/null
+++ "b/README.md"
@@ -0,0 +1,1682 @@
+---
+language:
+- yue
+license: apache-2.0
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:129371
+- loss:CachedGISTEmbedLoss
+base_model: hon9kon9ize/bert-large-cantonese-sts
+widget:
+- source_sentence: 'query: is ampulla of vater part of the pancreas'
+ sentences:
+ - 'document: Ampulla of Vater The ampulla of Vater, also known as the hepatopancreatic
+ ampulla or the hepatopancreatic duct, is formed by the union of the pancreatic
+ duct and the common bile duct. The ampulla is specifically located at the major
+ duodenal papilla.'
+ - 'document: 抗凝加化疗;化疗'
+ - 'document: Daylight saving time in Australia Daylight saving was first used in
+ Australia during World War I, and was applied in all states. It was used again
+ during the Second World War. A drought in Tasmania in 1967 led to the reintroduction
+ of daylight saving in that state during the summer, and this was repeated every
+ summer since then. In 1971, New South Wales, Victoria,[16] Queensland, South Australia,
+ and the Australian Capital Territory followed Tasmania by observing daylight saving.
+ Western Australia and the Northern Territory did not. Queensland abandoned daylight
+ saving time in 1972.[17]'
+- source_sentence: 'query: henry''s law states that the solubility of a gas in a liquid'
+ sentences:
+ - 'document: Henry''s law In chemistry, Henry''s law is a gas law that states that
+ the amount of dissolved gas is proportional to its partial pressure in the gas
+ phase. The proportionality factor is called the Henry''s law constant. It was
+ formulated by the English chemist William Henry, who studied the topic in the
+ early 19th century. In his publication about the quantity of gases absorbed by
+ water,[1] he described the results of his experiments:'
+ - 'document: Saint Stephen''s Day Saint Stephen''s Day, or the Feast of Saint Stephen,
+ is a Christian saint''s day to commemorate Saint Stephen, the first Christian
+ martyr or protomartyr, celebrated on 26 December in the Latin Church and 27 December
+ in Eastern Christianity. The Eastern Orthodox Church adheres to the Julian calendar
+ and mark Saint Stephen''s Day on 27 December according to that calendar, which
+ places it on 9 January of the Gregorian calendar used in secular contexts. In
+ Latin Christian denominations, Saint Stephen''s Day marks the second day of Christmastide.[1][2]'
+ - 'document: American Revolutionary War The American Revolutionary War (1775–1783),
+ also known as the American War of Independence,[40] was a global war that began
+ as a conflict between Great Britain and its Thirteen Colonies which declared independence
+ as the United States of America.[N 1]'
+- source_sentence: 'query: what is the plot of american horror story hotel'
+ sentences:
+ - 'document: American Horror Story: Hotel The plot centers around the enigmatic
+ Hotel Cortez in Los Angeles, California, that catches the eye of an intrepid homicide
+ detective (Bentley). The Cortez is host to the strange and bizarre, spearheaded
+ by its owner, The Countess (Gaga), who is a bloodsucking fashionista. The hotel
+ is loosely based on an actual hotel built in 1893 by H. H. Holmes in Chicago,
+ Il. for the 1893 World''s Columbian Exposition. It became known as the ''Murder
+ Castle'' as it was built for Holmes to torture, murder, and dispose of evidence
+ just as is the Cortez. This season features two murderous threats in the form
+ of the Ten Commandments Killer, a serial offender who selects his victims in accordance
+ with biblical teachings, and "the Addiction Demon", who roams the hotel armed
+ with a drill bit dildo.'
+ - 'document: Book of Job Rabbinic tradition ascribes the authorship of Job to Moses,
+ but scholars generally agree that it was written between the 7th and 4th centuries
+ BCE, with the 6th century BCE as the most likely period for various reasons.[17]
+ The anonymous author was almost certainly an Israelite, although he has set his
+ story outside Israel, in southern Edom or northern Arabia, and makes allusion
+ to places as far apart as Mesopotamia and Egypt.[18] According to the 6th-century
+ BCE prophet Ezekiel, Job was a man of antiquity renowned for his righteousness,[19]
+ and the book''s author has chosen this legendary hero for his parable.[20]'
+ - 'document: Galešnjak Galešnjak (also called Island of Love, Lover''s Island, Otok
+ za zaljubljene) is located in the Pašman channel of the Adriatic, between the
+ islands of Pašman and the town of Turanj on mainland Croatia. It is one of the
+ world''s few naturally occurring heart-shaped objects such as the Heart Reef in
+ the Whitsundays.'
+- source_sentence: 'query: what historical event inspired wollstonecraft''s book a
+ vindication of the rights of woman'
+ sentences:
+ - 'document: 銀河嘅獨特外形自古以嚟就引起人類嘅幻想。例如中國就有���郎織女嘅故事,相傳身為人類嘅牛郎同身為仙女嘅織女相遇並且墮入愛河,但因為人仙相戀犯天規而俾天界阻止,王母娘娘變條銀河出嚟分隔佢哋,限佢哋淨係喺每年嘅農曆七月初七先可以喺條鵲橋上面相會-呢個傳說就係傳統節日七姐誕嘅起源。'
+ - 'document: Rock Star (2001 film) The singing voice for Wahlberg''s character was
+ provided by Steelheart frontman Miljenko Matijevic for the Steel Dragon Songs,
+ the final number was dubbed by Brian Vander Ark. Jeff Scott Soto (of Talisman,
+ Yngwie Malmsteen, Soul SirkUS, and Journey) provided the voice of the singer Wahlberg''s
+ character replaces. Kennedy is the only actor whose actual voice is used.[citation
+ needed]. Ralph Saenz (Steel Panther) also appears briefly, as the singer auditioning
+ ahead of Chris at the studio.'
+ - 'document: A Vindication of the Rights of Woman Wollstonecraft was prompted to
+ write the Rights of Woman after reading Charles Maurice de Talleyrand-Périgord''s
+ 1791 report to the French National Assembly, which stated that women should only
+ receive a domestic education; she used her commentary on this specific event to
+ launch a broad attack against sexual double standards and to indict men for encouraging
+ women to indulge in excessive emotion. Wollstonecraft wrote the Rights of Woman
+ hurriedly to respond directly to ongoing events; she intended to write a more
+ thoughtful second volume but died before completing it.'
+- source_sentence: 'query: when did england change from fahrenheit to celsius'
+ sentences:
+ - 'document: Periodic table Importantly, the organization of the periodic table
+ can be utilized to derive relationships between various element properties, but
+ also predicted chemical properties and behaviours of undiscovered or newly synthesized
+ elements. Russian chemist Dmitri Mendeleev was first to publish a recognizable
+ periodic table in 1869, developed mainly to illustrate periodic trends of the
+ then-known elements. He also predicted some properties of unidentified elements
+ that were expected to fill gaps within this table. Most of his forecasts proved
+ to be correct. Mendeleev''s idea has been slowly expanded and refined with the
+ discovery or synthesis of further new elements and by developing new theoretical
+ models to explain chemical behaviour. The modern periodic table now provides a
+ useful framework for analyzing chemical reactions, and continues to be widely
+ adopted in chemistry, nuclear physics and other sciences.'
+ - 'document: How to Train Your Dragon (franchise) The How to Train Your Dragon franchise
+ from DreamWorks Animation consists of two feature films How to Train Your Dragon
+ (2010) and How to Train Your Dragon 2 (2014), with a third feature film, How to
+ Train Your Dragon: The Hidden World, set for a 2019 release. The franchise is
+ inspired by the British book series of the same name by Cressida Cowell. The franchise
+ also consists of four short films: Legend of the Boneknapper Dragon (2010), Book
+ of Dragons (2011), Gift of the Night Fury (2011) and Dawn of the Dragon Racers
+ (2014). A television series following the events of the first film, Dragons: Riders
+ of Berk, began airing on Cartoon Network in September 2012. Its second season
+ was renamed Dragons: Defenders of Berk. Set several years later, and as a more
+ immediate prequel to the second film, a new television series, titled Dragons:
+ Race to the Edge, aired on Netflix in June 2015.[1] The second season of the show
+ was added to Netflix in January 2016 and a third season in June 2016. A fourth
+ season aired on Netflix in February 2017, a fifth season in August 2017, and a
+ sixth and final season on February 16, 2018.'
+ - 'document: Metrication in the United Kingdom Adopting the metric system was discussed
+ in Parliament as early as 1818 and some industries and even some government agencies
+ had metricated, or were in the process of metricating by the mid 1960s. A formal
+ government policy to support metrication was agreed by 1965. This policy, initiated
+ in response to requests from industry, was to support voluntary metrication, with
+ costs picked up where they fell. In 1969 the government created the Metrication
+ Board as a quango to promote and coordinate metrication. In 1978, after some carpet
+ retailers reverted to pricing by the square yard rather than the square metre,
+ government policy shifted, and they started issuing orders making metrication
+ mandatory in certain sectors. In 1980 government policy shifted again to prefer
+ voluntary metrication, and the Metrication Board was abolished. By the time the
+ Metrication Board was wound up, all the economic sectors that fell within its
+ remit except road signage and parts of the retail trade sector had metricated.'
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy@1
+- cosine_accuracy@3
+- cosine_accuracy@5
+- cosine_accuracy@10
+- cosine_precision@1
+- cosine_precision@3
+- cosine_precision@5
+- cosine_precision@10
+- cosine_recall@1
+- cosine_recall@3
+- cosine_recall@5
+- cosine_recall@10
+- cosine_ndcg@10
+- cosine_mrr@10
+- cosine_map@100
+model-index:
+- name: Bert base fine-tuned with Cantonese and English mixed STS dataset
+ results:
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoClimateFEVER
+ type: NanoClimateFEVER
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.06
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.2
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.22
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.26
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.06
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.06666666666666667
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.05200000000000001
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.032
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.035
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.105
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.12666666666666665
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.14400000000000002
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.10738523976006756
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.12305555555555553
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.08386746046821102
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoDBPedia
+ type: NanoDBPedia
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.1
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.26
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.44
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.52
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.1
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.12666666666666665
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.15200000000000002
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.154
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.005776685612719247
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.025711996601987995
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.04879480020144454
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.08175565470928514
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.1564753058784049
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.22302380952380954
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.08481993410477483
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoFEVER
+ type: NanoFEVER
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.06
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.1
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.1
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.12
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.06
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.03333333333333333
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.02
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.012000000000000002
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.05
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.09
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.09
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.11
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.07804424038166692
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.07533333333333334
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.07658274436198606
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoFiQA2018
+ type: NanoFiQA2018
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.12
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.22
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.26
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.36
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.12
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.07999999999999999
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.064
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.046000000000000006
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.07085714285714287
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.13621428571428573
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.14993650793650792
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.21193650793650792
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.15989208858068493
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.18794444444444444
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.1278932041519149
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoHotpotQA
+ type: NanoHotpotQA
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.18
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.38
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.4
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.44
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.18
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.13333333333333333
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.084
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.05200000000000001
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.09
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.2
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.21
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.26
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.21524243911000313
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.2793333333333333
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.16949818775802034
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoMSMARCO
+ type: NanoMSMARCO
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.08
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.16
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.2
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.24
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.08
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.05333333333333333
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.04
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.024000000000000004
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.08
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.16
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.2
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.24
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.155021218726892
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.12816666666666665
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.14387227309213746
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoNFCorpus
+ type: NanoNFCorpus
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.1
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.1
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.12
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.18
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.1
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.06
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.05600000000000001
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.042
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.0023944899556066555
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.004511202133435534
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.005335271278326478
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.006887081773042016
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.0513758550014842
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.11271428571428571
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.011178329865269043
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoNQ
+ type: NanoNQ
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.12
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.26
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.38
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.44
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.12
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.08666666666666666
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.08
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.04800000000000001
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.11
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.24
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.37
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.43
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.26691470842049086
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.21954761904761902
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.22127704921258506
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoQuoraRetrieval
+ type: NanoQuoraRetrieval
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.56
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.66
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.68
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.8
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.56
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.25333333333333335
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.16
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.092
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.49
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.6073333333333334
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.634
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.7406666666666666
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.6315714749064664
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.6265555555555555
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.6007758177607536
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoSCIDOCS
+ type: NanoSCIDOCS
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.06
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.12
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.14
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.22
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.06
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.05333333333333333
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.036000000000000004
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.026000000000000002
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.015666666666666666
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.03666666666666667
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.04066666666666666
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.05666666666666667
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.05444580189319236
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.10085714285714287
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.03825732082321992
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoArguAna
+ type: NanoArguAna
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.12
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.34
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.52
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.64
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.12
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.11333333333333333
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.10400000000000001
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.064
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.12
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.34
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.52
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.64
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.36676045848370026
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.2815
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.28967419376346565
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoSciFact
+ type: NanoSciFact
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.18
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.22
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.32
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.36
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.18
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.07999999999999999
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.068
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.04
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.165
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.21
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.3
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.345
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.24854556538285397
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.22416666666666665
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.23077037853195492
+ name: Cosine Map@100
+ - task:
+ type: information-retrieval
+ name: Information Retrieval
+ dataset:
+ name: NanoTouche2020
+ type: NanoTouche2020
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.3469387755102041
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.7142857142857143
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.7959183673469388
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.9387755102040817
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.3469387755102041
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.32653061224489793
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.30612244897959184
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.2714285714285714
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.01725883684742171
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.06000832753846316
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.10128699807186763
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.17048580946181527
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.29344650277463163
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.5436912860382248
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.18279928418932134
+ name: Cosine Map@100
+ - task:
+ type: nano-beir
+ name: Nano BEIR
+ dataset:
+ name: NanoBEIR mean
+ type: NanoBEIR_mean
+ metrics:
+ - type: cosine_accuracy@1
+ value: 0.1605337519623234
+ name: Cosine Accuracy@1
+ - type: cosine_accuracy@3
+ value: 0.2872527472527473
+ name: Cosine Accuracy@3
+ - type: cosine_accuracy@5
+ value: 0.35199372056514916
+ name: Cosine Accuracy@5
+ - type: cosine_accuracy@10
+ value: 0.42452119309262165
+ name: Cosine Accuracy@10
+ - type: cosine_precision@1
+ value: 0.1605337519623234
+ name: Cosine Precision@1
+ - type: cosine_precision@3
+ value: 0.11281004709576138
+ name: Cosine Precision@3
+ - type: cosine_precision@5
+ value: 0.0940094191522763
+ name: Cosine Precision@5
+ - type: cosine_precision@10
+ value: 0.0694945054945055
+ name: Cosine Precision@10
+ - type: cosine_recall@1
+ value: 0.09630414014919672
+ name: Cosine Recall@1
+ - type: cosine_recall@3
+ value: 0.17041890861447484
+ name: Cosine Recall@3
+ - type: cosine_recall@5
+ value: 0.21512976237088305
+ name: Cosine Recall@5
+ - type: cosine_recall@10
+ value: 0.2644152605549218
+ name: Cosine Recall@10
+ - type: cosine_ndcg@10
+ value: 0.21424006917696453
+ name: Cosine Ndcg@10
+ - type: cosine_mrr@10
+ value: 0.2404530537489721
+ name: Cosine Mrr@10
+ - type: cosine_map@100
+ value: 0.17394355216027804
+ name: Cosine Map@100
+---
+
+# Bert base fine-tuned with Cantonese and English mixed STS dataset
+
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [hon9kon9ize/bert-large-cantonese-sts](https://huggingface.co/hon9kon9ize/bert-large-cantonese-sts). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+
+## Model Details
+
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [hon9kon9ize/bert-large-cantonese-sts](https://huggingface.co/hon9kon9ize/bert-large-cantonese-sts)
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 1024 dimensions
+- **Similarity Function:** Cosine Similarity
+
+- **Language:** yue
+- **License:** apache-2.0
+
+### Model Sources
+
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+
+### Full Model Architecture
+
+```
+SentenceTransformer(
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+)
+```
+
+## Usage
+
+### Direct Usage (Sentence Transformers)
+
+First install the Sentence Transformers library:
+
+```bash
+pip install -U sentence-transformers
+```
+
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+ 'query: when did england change from fahrenheit to celsius',
+ 'document: Metrication in the United Kingdom Adopting the metric system was discussed in Parliament as early as 1818 and some industries and even some government agencies had metricated, or were in the process of metricating by the mid 1960s. A formal government policy to support metrication was agreed by 1965. This policy, initiated in response to requests from industry, was to support voluntary metrication, with costs picked up where they fell. In 1969 the government created the Metrication Board as a quango to promote and coordinate metrication. In 1978, after some carpet retailers reverted to pricing by the square yard rather than the square metre, government policy shifted, and they started issuing orders making metrication mandatory in certain sectors. In 1980 government policy shifted again to prefer voluntary metrication, and the Metrication Board was abolished. By the time the Metrication Board was wound up, all the economic sectors that fell within its remit except road signage and parts of the retail trade sector had metricated.',
+ "document: Periodic table Importantly, the organization of the periodic table can be utilized to derive relationships between various element properties, but also predicted chemical properties and behaviours of undiscovered or newly synthesized elements. Russian chemist Dmitri Mendeleev was first to publish a recognizable periodic table in 1869, developed mainly to illustrate periodic trends of the then-known elements. He also predicted some properties of unidentified elements that were expected to fill gaps within this table. Most of his forecasts proved to be correct. Mendeleev's idea has been slowly expanded and refined with the discovery or synthesis of further new elements and by developing new theoretical models to explain chemical behaviour. The modern periodic table now provides a useful framework for analyzing chemical reactions, and continues to be widely adopted in chemistry, nuclear physics and other sciences.",
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 1024]
+
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+
+
+
+
+
+
+
+## Evaluation
+
+### Metrics
+
+#### Information Retrieval
+
+* Datasets: `NanoClimateFEVER`, `NanoDBPedia`, `NanoFEVER`, `NanoFiQA2018`, `NanoHotpotQA`, `NanoMSMARCO`, `NanoNFCorpus`, `NanoNQ`, `NanoQuoraRetrieval`, `NanoSCIDOCS`, `NanoArguAna`, `NanoSciFact` and `NanoTouche2020`
+* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
+
+| Metric | NanoClimateFEVER | NanoDBPedia | NanoFEVER | NanoFiQA2018 | NanoHotpotQA | NanoMSMARCO | NanoNFCorpus | NanoNQ | NanoQuoraRetrieval | NanoSCIDOCS | NanoArguAna | NanoSciFact | NanoTouche2020 |
+|:--------------------|:-----------------|:------------|:----------|:-------------|:-------------|:------------|:-------------|:-----------|:-------------------|:------------|:------------|:------------|:---------------|
+| cosine_accuracy@1 | 0.06 | 0.1 | 0.06 | 0.12 | 0.18 | 0.08 | 0.1 | 0.12 | 0.56 | 0.06 | 0.12 | 0.18 | 0.3469 |
+| cosine_accuracy@3 | 0.2 | 0.26 | 0.1 | 0.22 | 0.38 | 0.16 | 0.1 | 0.26 | 0.66 | 0.12 | 0.34 | 0.22 | 0.7143 |
+| cosine_accuracy@5 | 0.22 | 0.44 | 0.1 | 0.26 | 0.4 | 0.2 | 0.12 | 0.38 | 0.68 | 0.14 | 0.52 | 0.32 | 0.7959 |
+| cosine_accuracy@10 | 0.26 | 0.52 | 0.12 | 0.36 | 0.44 | 0.24 | 0.18 | 0.44 | 0.8 | 0.22 | 0.64 | 0.36 | 0.9388 |
+| cosine_precision@1 | 0.06 | 0.1 | 0.06 | 0.12 | 0.18 | 0.08 | 0.1 | 0.12 | 0.56 | 0.06 | 0.12 | 0.18 | 0.3469 |
+| cosine_precision@3 | 0.0667 | 0.1267 | 0.0333 | 0.08 | 0.1333 | 0.0533 | 0.06 | 0.0867 | 0.2533 | 0.0533 | 0.1133 | 0.08 | 0.3265 |
+| cosine_precision@5 | 0.052 | 0.152 | 0.02 | 0.064 | 0.084 | 0.04 | 0.056 | 0.08 | 0.16 | 0.036 | 0.104 | 0.068 | 0.3061 |
+| cosine_precision@10 | 0.032 | 0.154 | 0.012 | 0.046 | 0.052 | 0.024 | 0.042 | 0.048 | 0.092 | 0.026 | 0.064 | 0.04 | 0.2714 |
+| cosine_recall@1 | 0.035 | 0.0058 | 0.05 | 0.0709 | 0.09 | 0.08 | 0.0024 | 0.11 | 0.49 | 0.0157 | 0.12 | 0.165 | 0.0173 |
+| cosine_recall@3 | 0.105 | 0.0257 | 0.09 | 0.1362 | 0.2 | 0.16 | 0.0045 | 0.24 | 0.6073 | 0.0367 | 0.34 | 0.21 | 0.06 |
+| cosine_recall@5 | 0.1267 | 0.0488 | 0.09 | 0.1499 | 0.21 | 0.2 | 0.0053 | 0.37 | 0.634 | 0.0407 | 0.52 | 0.3 | 0.1013 |
+| cosine_recall@10 | 0.144 | 0.0818 | 0.11 | 0.2119 | 0.26 | 0.24 | 0.0069 | 0.43 | 0.7407 | 0.0567 | 0.64 | 0.345 | 0.1705 |
+| **cosine_ndcg@10** | **0.1074** | **0.1565** | **0.078** | **0.1599** | **0.2152** | **0.155** | **0.0514** | **0.2669** | **0.6316** | **0.0544** | **0.3668** | **0.2485** | **0.2934** |
+| cosine_mrr@10 | 0.1231 | 0.223 | 0.0753 | 0.1879 | 0.2793 | 0.1282 | 0.1127 | 0.2195 | 0.6266 | 0.1009 | 0.2815 | 0.2242 | 0.5437 |
+| cosine_map@100 | 0.0839 | 0.0848 | 0.0766 | 0.1279 | 0.1695 | 0.1439 | 0.0112 | 0.2213 | 0.6008 | 0.0383 | 0.2897 | 0.2308 | 0.1828 |
+
+#### Nano BEIR
+
+* Dataset: `NanoBEIR_mean`
+* Evaluated with [NanoBEIREvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.NanoBEIREvaluator)
+
+| Metric | Value |
+|:--------------------|:-----------|
+| cosine_accuracy@1 | 0.1605 |
+| cosine_accuracy@3 | 0.2873 |
+| cosine_accuracy@5 | 0.352 |
+| cosine_accuracy@10 | 0.4245 |
+| cosine_precision@1 | 0.1605 |
+| cosine_precision@3 | 0.1128 |
+| cosine_precision@5 | 0.094 |
+| cosine_precision@10 | 0.0695 |
+| cosine_recall@1 | 0.0963 |
+| cosine_recall@3 | 0.1704 |
+| cosine_recall@5 | 0.2151 |
+| cosine_recall@10 | 0.2644 |
+| **cosine_ndcg@10** | **0.2142** |
+| cosine_mrr@10 | 0.2405 |
+| cosine_map@100 | 0.1739 |
+
+
+
+
+
+## Training Details
+
+### Training Dataset
+
+#### Unnamed Dataset
+
+
+* Size: 129,371 training samples
+* Columns: query
and answer
+* Approximate statistics based on the first 1000 samples:
+ | | query | answer |
+ |:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
+ | type | string | string |
+ | details |
query: hotel and restaurant employees and bartenders international union
| document: Hotel Employees and Restaurant Employees Union The Hotel Employees and Restaurant Employees Union (HERE) was a United States labor union representing workers of the hospitality industry, formed in 1891. In 2004, HERE merged with the Union of Needletrades, Industrial, and Textile Employees (UNITE) to form UNITE HERE. HERE notably organized the staff of Yale University in 1984. Other major employers that contracted with this union included several large casinos (Harrah's, Caesars Palace, and Wynn Resorts); hotels (Hilton, Hyatt and Starwood), and Walt Disney World. HERE was affiliated with the AFL-CIO.
|
+ | query: 多肢离断伤的并发症是什么?
| document: 失血性休克;血循环危象;急性肾功能衰竭
|
+ | query: who is the father of kelly taylor's son on 90210
| document: Kelly Taylor (90210) In 2008, Kelly Taylor returned in the spin-off 90210, now working as a guidance counselor at her alma mater West Beverly Hills High School. It was revealed that in the intervening years, she attained a master's degree and had a son named Sammy with Dylan. She and Dylan ended their relationship soon after. It was also revealed that West Beverly principal Harry Wilson was Kelly's neighbor growing up.[39]
|
+* Loss: [CachedGISTEmbedLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
+ ```json
+ {'guide': SentenceTransformer(
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+ (2): Normalize()
+ ), 'temperature': 0.01}
+ ```
+
+### Evaluation Dataset
+
+#### Unnamed Dataset
+
+
+* Size: 1,000 evaluation samples
+* Columns: query
and answer
+* Approximate statistics based on the first 1000 samples:
+ | | query | answer |
+ |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
+ | type | string | string |
+ | details | query: 微创经皮肾镜手术的推荐药有些什么?
| document: 阿司匹林
|
+ | query: why are the fires in ca called the thomas fires
| document: Thomas Fire On December 4, 2017, the Thomas Fire was reported at 6:26 p.m. PST,[36] to the north of Santa Paula, near Steckel Park and Thomas Aquinas College,[3][24] after which the fire is named.[37] That night, the small brush fire exploded in size and raced through the rugged mountain terrain that lies west of Santa Paula, between Ventura and Ojai.[19][38] Officials blamed strong Santa Ana winds that gusted up to 60 miles per hour (97 km/h) for the sudden expansion.[28][39] Soon after the fire had started, a second blaze was ignited nearly 30 minutes later, about 4 miles (6.4 km) to the north in Upper Ojai at the top of Koenigstein Road.[40] According to eyewitnesses, this second fire was sparked by an explosion in the power line over the area. The second fire was rapidly expanded by the strong Santa Ana winds, and soon merged into the Thomas Fire later that night.[40]
|
+ | query: which mountain man rediscovered south pass and brought back important information about this trail
| document: Jedediah Smith Jedediah Strong Smith (January 6, 1799 – May 27, 1831), was a clerk, frontiersman, hunter, trapper, author, cartographer, and explorer of the Rocky Mountains, the North American West, and the Southwest during the early 19th century. After 75 years of obscurity following his death, Smith was rediscovered as the American whose explorations led to the use of the 20-mile (32 km)-wide South Pass as the dominant point of crossing the Continental Divide for pioneers on the Oregon Trail.
|
+* Loss: [CachedGISTEmbedLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters:
+ ```json
+ {'guide': SentenceTransformer(
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+ (2): Normalize()
+ ), 'temperature': 0.01}
+ ```
+
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 128
+- `per_device_eval_batch_size`: 128
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 2
+- `warmup_ratio`: 0.05
+- `seed`: 12
+- `bf16`: True
+- `prompts`: {'query': 'query: ', 'answer': 'document: '}
+- `batch_sampler`: no_duplicates
+
+#### All Hyperparameters
+