ImageVerifierCode 换一换
格式:PDF , 页数:27 ,大小:34.09MB ,
资源ID:3045404      下载积分:2 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.wnwk.com/docdown/3045404.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: QQ登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(大规模预训练语言模型在百度搜索中的应用研究-王帅强.pdf)为本站会员(a****2)主动上传,蜗牛文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知蜗牛文库(发送邮件至admin@wnwk.com或直接QQ联系客服),我们立即给予删除!

大规模预训练语言模型在百度搜索中的应用研究-王帅强.pdf

1、Search SciencePre-trained Language Model forWeb-Scale Retrieval&Rankingin Baidu SearchShuaiqiang WANGhttp:/Search ScienceOutline1234BackgroundRetrievalRankingSummarySearch ScienceOutline1234BackgroundRetrievalRankingSummarySearch ScienceBackground Retrieval and ranking are two crucial stages in web-

2、scale search engineDatabaseRetrievalRankingQueryResultsWeb-scaledocumentsFew hundreds orthousands candidatesSearch Sciencehttps:/ al,2019.Ernie:Enhanced representation through knowledge integration.In arXiv:1904.09223.2.Sun,Y.et al,2020.Ernie 2.0:A continual pre-training framework for language under

3、standing.In AAAI.百度ERNIESearch Science Beyond text matching:semantic retrieval&ranking Representation-based methods Representation:document semantics aslatent vectors Retrieval:nearest neighbor search inlatent space Interaction-based models Ranking:matching over the local interactions*Picture from:D

4、ai,Andrew M.,Christopher Olah,and Quoc V.Le.Document embedding with paragraph vectors.arXiv preprint arXiv:1507.07998(2015).QuerySemantically-related candidatesBackgroundSearch Science Semantic retrieval Effectively understand the semantics of queries and documents Large number of low-frequency quer

5、ies Web-scale retrieval system Semantic ranking Expensive computations Ranking-agnostic pre-training Challenges!#$%&#(.%&.)(CLS*#*$SEP*#(.SEP.*)(&!+%&#&$&%&#(.&%&.&)(,%ERNIE-+-+-Masked Sentence AMasked Sentence BOur contribution:One of the largest application ofPLM for Web-scale Retrieval&Ranking1.Z

6、ou L.et al.Pre-trained Language Model based Ranking in Baidu Search.In KDD 2021.2.Liu Y.et al.Pre-trained Language Model for Web-scale Retrieval in Baidu Search.In KDD 2021.Search ScienceOutline1234BackgroundRetrievalRankingSummarySearch Science Retrieval model Goal:learning query-document semantic

7、relatedness Backbone:a bi-encoders(i.e.,two-tower)architecture*,with Query&Doc encoders:transformersMethodology Retrieval ModelQuery EncoderCLS-pooling!CLS#$SEP.Doc EncoderCLS-pooling%CLS#&SEP.Query(tokenized)Doc(tokenized)retrieval score(Query embeddingDoc embedding*Chang,Wei-Cheng,et al.Pre-traini

8、ng tasks for embedding-based large-scale retrieval.arXiv preprint arXiv:2002.03932(2020).Search Science Retrieval model Goal:learning query-document semantic relatedness Poly-attention:bi-encoders with more query-document interaction*Methodology Retrieval Model*Humeau,Samuel,et al.Poly-encoders:Tran

9、sformer architectures and pre-training strategies for fast and accurate multi-sentence scoring.arXiv preprint arXiv:1905.01969(2019).!#$!.#&.&EncoderEncoderAttentionCode 1AttentionCode mCLS-poolingP1Pm.!score s1.score smscore(=*+,-.#/(-Produce multiple embeddingson the query sideSearch Science Posit

10、ive&negative data mining for different data sources Search log positives:user-clicked documents;negatives:non-clicked documents Manually labeled data positives:high-scored documents;negatives:low-scored documents In-batch negative mining Introducing random negatives Benefits:More aligned with retrie

11、val task Efficiently scale up the number of negativesMethodology Retrieval Model!#!$!%&#&$&%&(&#(&$(&%(!)query&)relevant doc&)(strong negativerelevant(+,-)pairirrelevant(+,-)paircorresponding tostrong negativeirrelevant(+,-)paircorresponding torandom negativeSearch Science Multi-stage training parad

12、igm Unsupervised-Supervised General corpus-Task-specific dataMethodology Training ParadigmSearch ScienceMethodology Embedding Compression Mode deployment Compression QuantizationDoc EmbeddingDoc EmbeddingQuantizationDoc EmbeddingCompression with additional FC layerSearch ScienceMethodology System Wo

13、rkflow Deployment Integrating term-based&ERNIE-based retrieval Unifying results with post-retrieval filtering1.2.3.Text MatchingSearch ScienceEvaluation Online Evaluation Metrics:DCG&GSB#Good=#queries that the new system performs better ResultsSearch ScienceOutline1234BackgroundRetrievalRankingSumma

14、rySearch ScienceContent-aware Pre-trained Language ModelPyramid-ERNIEMethodTime ComplexityOriginal ErniePyramid-ErnieO(Lh(Nq+Nt+Ns)2)O(Llowh(Nq+Nd)2)+Llowh(Ns)2+Lhighh(Nq+Nt+Ns)2)Search ScienceQUery-WeIghted Summary ExTraction(QUIET)QueryTerm1Term2Term3W1W2W3Sentence1A0B0C0Term1W1Term3W3Sentence2D0E

15、0G0Term1W1F0H0STEP-1Sentence1W1+W3Sentence2W1TermTerm WeightTermTerm WeightTermTerm WeightCandidateScoreChoose the sentence with max score.Remove the selected sentence from the Candidate.STEP-2STEP-3!%!&!&%Search ScienceFinetune with Web-Scale Calibrated ClicksRaw clicksNoisy and inconsistent with r

16、elevanceCalibrated clicksAligning clicks with human labelsPretrainwith general dataPost-pretrainwithSearch logFinetune withcalibrated clicksFinetune withhuman labelsGeneral ErnieErnie for SearchErnie for SearchErnie for SearchRawclicksCalibratedclicksLabelgeneratorHumanlabelsSearch ScienceFinetune w

17、ith Human LabelsWe manually labeled millions of query-document pairs and train the Pyramid-ERNIE with amixture of pairwise and pointwise loss(Y,F(q,D)=Xyiyjmax(0,f(q,di)?f(q,dj)+)+?(?(f(q,di),yi)+?(f(q,dj),yj)Search ScienceEvaluationBaseA basic ERNIE-based ranking policy,fine-tuned with a pairwise l

18、ossusing human-labeled query-document pairs.Content-aware Pyramid-ERNIE(CAP)A Pyramid-ERNIE architecture,incorporatingthe query-dependentdocument summary into the deep contextualization to better capturethe relevance between the query and document.Relevance-oriented Pre-training(REP)Pre-training the

19、 Pyramid-ERNIE model with refined large-scale user-behavioral data before fine-tuning it on the task data.Human-anchored Fine-tuning(HINT)HINT anchors the ranking model with human-preferred relevancescores.Search ScienceEvaluationModel?DCG?AB?GSB?DCG2?DCG4RandomLong-Tail RandomLong-TailBase-+CAP0.65

20、%0.76%0.15%0.35%3.50%6.00%+CAP+REP2.78%1.37%0.58%0.41%5.50%7.00%+CAP+REP+HINT 2.85%1.58%0.14%0.45%6.00%7.50%“”indicates the statistically significant improvement(t-test with p 0.05 over the baseline).Search ScienceOutline1234BackgroundRetrievalRankingSummarySearch ScienceConclusionPLM-based retrieva

21、l and ranking models ERNIE-based models Multi-stage training paradigm Fully deployed onlineDatabaseRetrievalRankingQueryResultsWeb-scaledocumentsHundreds or thousandsof candidatesA simple search pipelineSearch ScienceWe are hiring!Please drop a message if interested.Search ScienceThank YouShuaiqiang WANGhttp:/

copyright@ 2008-2023 wnwk.com网站版权所有

经营许可证编号:浙ICP备2024059924号-2