js提取指定网站内容 谈谈对vitejs预构建的理解

优采云 发布时间: 2022-06-29 21:43

  js提取指定网站内容 谈谈对vitejs预构建的理解

  vite在官网介绍中,第一条就提到的特性就是自己的本地冷启动极快。这主要是得益于它在本地服务启动的时候做了预构建。出于好奇,抽时间了解了下vite在预构建部分的主要实现思路,分享出来供大家参考。

  为啥要预构建

  简单来讲就是为了提高本地开发服务器的冷启动速度。按照vite的说法,当冷启动开发服务器时,基于打包器的方式启动必须优先抓取并构建你的整个应用,然后才能提供服务。随着应用规模的增大,打包速度显著下降,本地服务器的启动速度也跟着变慢。

  为了加快本地开发服务器的启动速度,vite引入了预构建机制。在预构建工具的选择上,vite选择了esbuild。esbuild使用Go编写,比以JavaScript编写的打包器构建速度快 10-100 倍,有了预构建,再利用浏览器的esm方式按需加载业务代码,动态实时进行构建,结合缓存机制,大大提升了服务器的启动速度。

  

  预构建的流程1. 查找依赖

  如果是首次启动本地服务,那么vite会自动抓取源代码,从代码中找到需要预构建的依赖,最终对外返回类似下面的一个deps对象:

  { vue: '/path/to/your/project/node_modules/vue/dist/vue.runtime.esm-bundler.js', 'element-plus': '/path/to/your/project/node_modules/element-plus/es/index.mjs', 'vue-router': '/path/to/your/project/node_modules/vue-router/dist/vue-router.esm-bundler.js'}

  具体实现就是,调用esbuild的buildapi,以index.html作为查找入口(entryPoints),将所有的来自node_modules以及在配置文件的optimizeDeps.include选项中指定的模块找出来。

  //...省略其他代码 if (explicitEntryPatterns) { entries = await globEntries(explicitEntryPatterns, config) } else if (buildInput) { const resolvePath = (p: string) => path.resolve(config.root, p) if (typeof buildInput === 'string') { entries = [resolvePath(buildInput)] } else if (Array.isArray(buildInput)) { entries = buildInput.map(resolvePath) } else if (isObject(buildInput)) { entries = Object.values(buildInput).map(resolvePath) } else { throw new Error('invalid rollupOptions.input value.') } } else { // 重点看这里:使用html文件作为查找入口 entries = await globEntries('**/*.html', config) } //...省略其他代码build.onResolve( { // avoid matching windows volume filter: /^[\w@][^:]/ }, async ({ path: id, importer }) => { const resolved = await resolve(id, importer) if (resolved) { // 来自node_modules和在include中指定的模块 if (resolved.includes('node_modules') || include?.includes(id)) { // dependency or forced included, externalize and stop crawling<br /> if (isOptimizable(resolved)) { // 重点看这里:将符合预构建条件的依赖记录下来,depImports就是对外导出的需要预构建的依赖对象 depImports[id] = resolved } return externalUnlessEntry({ path: id }) } else if (isScannable(resolved)) { const namespace = htmlTypesRE.test(resolved) ? 'html' : undefined // linked package, keep crawling return { path: path.resolve(resolved), namespace } } else { return externalUnlessEntry({ path: id }) } } else { missing[id] = normalizePath(importer) } } )

  但是熟悉esbuild的小伙伴可能知道,esbuild默认支持的入口文件类型有js、ts、jsx、css、json、base64、dataurl、binary、file(.png等),并不包括html。vite是如何做到将index.html作为打包入口的呢?原因是vite自己实现了一个esbuild插件esbuildScanPlugin,来处理.vue和.html这种类型的文件。具体做法是读取html的内容,然后将里面的script提取到一个esm格式的js模块。

  

   // 对于html类型(.VUE/.HTML/.svelte等)的文件,提取文件里的script内容。html types: extract script contents ----------------------------------- build.onResolve({ filter: htmlTypesRE }, async ({ path, importer }) => { const resolved = await resolve(path, importer) if (!resolved) return // It is possible for the scanner to scan html types in node_modules. // If we can optimize this html type, skip it so it's handled by the // bare import resolve, and recorded as optimization dep. if (resolved.includes('node_modules') && isOptimizable(resolved)) return return { path: resolved, namespace: 'html' } })<br /> // 配合build.onResolve,对于类html文件,提取其中的script,作为一个js模块extract scripts inside HTML-like files and treat it as a js module build.onLoad( { filter: htmlTypesRE, namespace: 'html' }, async ({ path }) => { let raw = fs.readFileSync(path, 'utf-8') // Avoid matching the content of the comment raw = raw.replace(commentRE, '') const isHtml = path.endsWith('.html') const regex = isHtml ? scriptModuleRE : scriptRE regex.lastIndex = 0 // js 的内容被处理成了一个虚拟模块 let js = '' let scriptId = 0 let match: RegExpExecArray | null while ((match = regex.exec(raw))) { const [, openTag, content] = match const typeMatch = openTag.match(typeRE) const type = typeMatch && (typeMatch[1] || typeMatch[2] || typeMatch[3]) const langMatch = openTag.match(langRE) const lang = langMatch && (langMatch[1] || langMatch[2] || langMatch[3]) // skip type="application/ld+json" and other non-JS types if ( type && !( type.includes('javascript') || type.includes('ecmascript') || type === 'module' ) ) { continue } // 默认的js文件的loader是js,其他对于ts、tsx jsx有对应的同名loader let loader: Loader = 'js' if (lang === 'ts' || lang === 'tsx' || lang === 'jsx') { loader = lang } const srcMatch = openTag.match(srcRE) // 对于引入的js,将它转换为import 'path/to/some.js'的代码 if (srcMatch) { const src = srcMatch[1] || srcMatch[2] || srcMatch[3] js += `import ${JSON.stringify(src)}\n` } else if (content.trim()) { // The reason why virtual modules are needed: // 1. There can be module scripts (`` in Svelte and `` in Vue) // or local scripts (`` in Svelte and `` in Vue) // 2. There can be multiple module scripts in html // We need to handle these separately in case variable names are reused between them<br /> // append imports in TS to prevent esbuild from removing them // since they may be used in the template const contents = content + (loader.startsWith('ts') ? extractImportPaths(content) : '')<br /> // 将提取出来的script脚本,存在以xx.vue?id=1为key的script对象中script={'xx.vue?id=1': 'js contents'} const key = `${path}?id=${scriptId++}`<br /> if (contents.includes('import.meta.glob')) { scripts[key] = { // transformGlob already transforms to js loader: 'js', contents: await transformGlob( contents, path, config.root, loader, resolve, config.logger ) } } else { scripts[key] = { loader, contents } }<br /> const virtualModulePath = JSON.stringify( virtualModulePrefix + key )<br /> const contextMatch = openTag.match(contextRE) const context = contextMatch && (contextMatch[1] || contextMatch[2] || contextMatch[3])<br /> // Especially for Svelte files, exports in means module exports, // exports in means component props. To avoid having two same export name from the // star exports, we need to ignore exports in if (path.endsWith('.svelte') && context !== 'module') { js += `import ${virtualModulePath}\n` } else { // e.g. export * from 'virtual-module:xx.vue?id=1' js += `export * from ${virtualModulePath}\n` } } }<br /> // This will trigger incorrectly if `export default` is contained // anywhere in a string. Svelte and Astro files can't have // `export default` as code so we know if it's encountered it's a // false positive (e.g. contained in a string) if (!path.endsWith('.vue') || !js.includes('export default')) { js += '\nexport default {}' }<br /> return { loader: 'js', contents: js } } )

  由上文我们可知,来自node_modules中的模块依赖是需要预构建的。例如import ElementPlus from 'element-plus'。因为在浏览器环境下,是不支持这种裸模块引用的(bare import)。另一方面,如果不进行构建,浏览器面对由成百上千的子模块组成的依赖,依靠原生esm的加载机制,每个的依赖的import都将产生一次http请求。面对大量的请求,浏览器是吃不消的。因此客观上需要对裸模块引入进行打包,并处理成浏览器环境下支持的相对路径或路径的导入方式。例如:import ElementPlus from '/path/to/.vite/element-plus/es/index.mjs'。

  2. 对查找到的依赖进行构建

  在上一步,已经得到了需要预构建的依赖列表。现在需要把他们作为esbuild的entryPoints打包就行了。

  //使用esbuild打包,入口文件即为第一步中抓取到的需要预构建的依赖 import { build } from 'esbuild' // ...省略其他代码 const result = await build({ absWorkingDir: process.cwd(), // flatIdDeps即为第一步中所得到的需要预构建的依赖对象 entryPoints: Object.keys(flatIdDeps), bundle: true, format: 'esm', target: config.build.target || undefined, external: config.optimizeDeps?.exclude, logLevel: 'error', splitting: true, sourcemap: true,// outdir指定打包产物输出目录,processingCacheDir这里并不是.vite,而是存放构建产物的临时目录 outdir: processingCacheDir, ignoreAnnotations: true, metafile: true, define, plugins: [ ...plugins, esbuildDepPlugin(flatIdDeps, flatIdToExports, config, ssr) ], ...esbuildOptions })<br /> // 写入_metadata文件,并替换缓存文件。Write metadata file, delete `deps` folder and rename the new `processing` folder to `deps` in sync commitProcessingDepsCacheSync()

  vite并没有将esbuild的outdir(构建产物的输出目录)直接配置为.vite目录,而是先将构建产物存放到了一个临时目录。当构建完成后,才将原来旧的.vite(如果有的话)删除。然后再将临时目录重命名为.vite。这样做主要是为了避免在程序运行过程中发生了错误,导致缓存不可用。

   function commitProcessingDepsCacheSync() { // Rewire the file paths from the temporal processing dir to the final deps cache dir const dataPath = path.join(processingCacheDir, '_metadata.json') writeFile(dataPath, stringifyOptimizedDepsMetadata(metadata)) // Processing is done, we can now replace the depsCacheDir with processingCacheDir // 依赖处理完成后,使用依赖缓存目录替换处理中的依赖缓存目录 if (fs.existsSync(depsCacheDir)) { const rmSync = fs.rmSync ?? fs.rmdirSync // TODO: Remove after support for Node 12 is dropped rmSync(depsCacheDir, { recursive: true }) } fs.renameSync(processingCacheDir, depsCacheDir) }}

0 个评论

要回复文章请先登录注册


官方客服QQ群

微信人工客服

QQ人工客服


线