Skip to content

Commit df58891

Browse files
feat(gatsby-source-filesystem): Only generate hashes when a file has changed, and add an option for skipping hashing (#37464)
Co-authored-by: LekoArts <[email protected]>
1 parent 949132b commit df58891

File tree

5 files changed

+236
-111
lines changed

5 files changed

+236
-111
lines changed

packages/gatsby-source-filesystem/README.md

Lines changed: 47 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,28 @@
11
# gatsby-source-filesystem
22

3-
A Gatsby source plugin for sourcing data into your Gatsby application
4-
from your local filesystem.
3+
A Gatsby source plugin for sourcing data into your Gatsby application from your local filesystem.
54

6-
The plugin creates `File` nodes from files. The various "transformer"
7-
plugins can transform `File` nodes into various other types of data e.g.
8-
`gatsby-transformer-json` transforms JSON files into JSON data nodes and
9-
`gatsby-transformer-remark` transforms markdown files into `MarkdownRemark`
10-
nodes from which you can query an HTML representation of the markdown.
5+
The plugin creates `File` nodes from files. The various "transformer" plugins can transform `File` nodes into various other types of data e.g. [`gatsby-transformer-json`](https://www.gatsbyjs.com/plugins/gatsby-transformer-json/) transforms JSON files into JSON data nodes and [`gatsby-transformer-remark`](https://www.gatsbyjs.com/plugins/gatsby-transformer-remark/) transforms markdown files into `MarkdownRemark` nodes from which you can query an HTML representation of the markdown.
116

127
## Install
138

14-
`npm install gatsby-source-filesystem`
9+
```shell
10+
npm install gatsby-source-filesystem
11+
```
1512

1613
## How to use
1714

18-
```javascript
19-
// In your gatsby-config.js
15+
You can have multiple instances of this plugin to read source nodes from different locations on your filesystem. Be sure to give each instance a unique `name`.
16+
17+
```js:title=gatsby-config.js
2018
module.exports = {
2119
plugins: [
22-
// You can have multiple instances of this plugin
23-
// to read source nodes from different locations on your
24-
// filesystem.
25-
//
26-
// The following sets up the Jekyll pattern of having a
27-
// "pages" directory for Markdown files and a "data" directory
28-
// for `.json`, `.yaml`, `.csv`.
2920
{
3021
resolve: `gatsby-source-filesystem`,
3122
options: {
23+
// The unique name for each instance
3224
name: `pages`,
25+
// Path to the directory
3326
path: `${__dirname}/src/pages/`,
3427
},
3528
},
@@ -38,7 +31,10 @@ module.exports = {
3831
options: {
3932
name: `data`,
4033
path: `${__dirname}/src/data/`,
41-
ignore: [`**/\.*`], // ignore files starting with a dot
34+
// Ignore files starting with a dot
35+
ignore: [`**/\.*`],
36+
// Use "mtime" and "inode" to fingerprint files (to check if file has changed)
37+
fastHash: true,
4238
},
4339
},
4440
],
@@ -47,9 +43,23 @@ module.exports = {
4743

4844
## Options
4945

50-
In addition to the name and path parameters you may pass an optional `ignore` array of file globs to ignore.
46+
### name
47+
48+
**Required**
49+
50+
A unique name for the `gatsby-source-filesytem` instance. This name will also be a key on the `File` node called `sourceInstanceName`. You can use this e.g. for filtering.
51+
52+
### path
53+
54+
**Required**
55+
56+
Path to the folder that should be sourced. Ideally an absolute path.
5157

52-
They will be added to the following default list:
58+
### ignore
59+
60+
**Optional**
61+
62+
Array of file globs to ignore. They will be added to the following default list:
5363

5464
```text
5565
**/*.un~
@@ -62,8 +72,24 @@ They will be added to the following default list:
6272
../**/dist/**
6373
```
6474

75+
### fastHash
76+
77+
**Optional**
78+
79+
By default, `gatsby-source-filesystem` creates an MD5 hash of each file to determine if it has changed between sourcing. However, on sites with many large files this can lead to a significant slowdown. Thus you can enable the `fastHash` setting to use an alternative hashing mechanism.
80+
81+
`fastHash` uses the `mtime` and `inode` to fingerprint the files. On a modern OS this can be considered a robust solution to determine if a file has changed, however on older systems it can be unreliable. Therefore it's not enabled by default.
82+
83+
### Environment variables
84+
6585
To prevent concurrent requests overload of `processRemoteNode`, you can adjust the `200` default concurrent downloads, with `GATSBY_CONCURRENT_DOWNLOAD` environment variable.
6686

87+
In case that due to spotty network, or slow connection, some remote files fail to download. Even after multiple retries and adjusting concurrent downloads, you can adjust timeout and retry settings with these environment variables:
88+
89+
- `GATSBY_STALL_RETRY_LIMIT`, default: `3`
90+
- `GATSBY_STALL_TIMEOUT`, default: `30000`
91+
- `GATSBY_CONNECTION_TIMEOUT`, default: `30000`
92+
6793
## How to query
6894

6995
You can query file nodes like the following:
@@ -263,7 +289,7 @@ The `createFileNodeFromBuffer` helper accepts a `Buffer`, caches its contents to
263289

264290
The name of the file can be passed to the `createFileNodeFromBuffer` helper. If no name is given, the content hash will be used to determine the name.
265291

266-
## Example usage
292+
#### Example usage
267293

268294
The following example is adapted from the source of [`gatsby-source-mysql`](https://github.com/malcolm-kee/gatsby-source-mysql):
269295

@@ -338,11 +364,3 @@ function createMySqlNodes({ name, __sql, idField, keys }, results, ctx) {
338364

339365
module.exports = createMySqlNodes
340366
```
341-
342-
## Troubleshooting
343-
344-
In case that due to spotty network, or slow connection, some remote files fail to download. Even after multiple retries and adjusting concurrent downloads, you can adjust timeout and retry settings with these environment variables:
345-
346-
- `GATSBY_STALL_RETRY_LIMIT`, default: `3`
347-
- `GATSBY_STALL_TIMEOUT`, default: `30000`
348-
- `GATSBY_CONNECTION_TIMEOUT`, default: `30000`

packages/gatsby-source-filesystem/package.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
"file-type": "^16.5.4",
1313
"fs-extra": "^11.1.0",
1414
"gatsby-core-utils": "^4.5.0-next.0",
15-
"md5-file": "^5.0.0",
1615
"mime": "^3.0.0",
1716
"pretty-bytes": "^5.6.0",
1817
"valid-url": "^1.0.9",

packages/gatsby-source-filesystem/src/__tests__/create-file-node.js

Lines changed: 161 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,95 @@ const fs = require(`fs-extra`)
55

66
const fsStatBak = fs.stat
77

8+
const createMockCache = (get = jest.fn()) => {
9+
return {
10+
get,
11+
set: jest.fn(),
12+
directory: __dirname,
13+
}
14+
}
15+
16+
const createMockCreateNodeId = () => {
17+
const createNodeId = jest.fn()
18+
createNodeId.mockReturnValue(`uuid-from-gatsby`)
19+
return createNodeId
20+
}
21+
22+
// MD5 hash of the file (if the mock below changes this should change)
23+
const fileHash = `8d777f385d3dfec8815d20f7496026dc`
24+
25+
// mtime + inode (if the mock below changes this should change)
26+
const fileFastHash = `123456123456`
27+
28+
function testNode(node, dname, fname, contentDigest) {
29+
// Sanitize all filenames
30+
Object.keys(node).forEach(key => {
31+
if (typeof node[key] === `string`) {
32+
node[key] = node[key].replace(new RegExp(dname, `g`), `<DIR>`)
33+
node[key] = node[key].replace(new RegExp(fname, `g`), `<FILE>`)
34+
}
35+
})
36+
Object.keys(node.internal).forEach(key => {
37+
if (typeof node.internal[key] === `string`) {
38+
node.internal[key] = node.internal[key].replace(
39+
new RegExp(dname, `g`),
40+
`<DIR>`
41+
)
42+
node.internal[key] = node.internal[key].replace(
43+
new RegExp(fname, `g`),
44+
`<FILE>`
45+
)
46+
}
47+
})
48+
49+
// Note: this snapshot should update if the mock below is changed
50+
expect(node).toMatchInlineSnapshot(`
51+
Object {
52+
"absolutePath": "<DIR>/f",
53+
"accessTime": "1970-01-01T00:02:03.456Z",
54+
"atime": "1970-01-01T00:02:03.456Z",
55+
"atimeMs": 123456,
56+
"base": "f",
57+
"birthTime": "1970-01-01T00:02:03.456Z",
58+
"birthtime": "1970-01-01T00:02:03.456Z",
59+
"birthtimeMs": 123456,
60+
"blksize": 123456,
61+
"blocks": 123456,
62+
"changeTime": "1970-01-01T00:02:03.456Z",
63+
"children": Array [],
64+
"ctime": "1970-01-01T00:02:03.456Z",
65+
"ctimeMs": 123456,
66+
"dev": 123456,
67+
"dir": "<DIR>",
68+
"ext": "",
69+
"extension": "",
70+
"id": "uuid-from-gatsby",
71+
"ino": 123456,
72+
"internal": Object {
73+
"contentDigest": "${contentDigest}",
74+
"description": "File \\"<DIR>/f\\"",
75+
"mediaType": "application/octet-stream",
76+
"type": "File",
77+
},
78+
"mode": 123456,
79+
"modifiedTime": "1970-01-01T00:02:03.456Z",
80+
"mtime": "1970-01-01T00:02:03.456Z",
81+
"mtimeMs": 123456,
82+
"name": "f",
83+
"nlink": 123456,
84+
"parent": null,
85+
"prettySize": "123 kB",
86+
"rdev": 123456,
87+
"relativeDirectory": "<DIR>",
88+
"relativePath": "<DIR>/f",
89+
"root": "",
90+
"size": 123456,
91+
"sourceInstanceName": "__PROGRAMMATIC__",
92+
"uid": 123456,
93+
}
94+
`)
95+
}
96+
897
// FIXME: This test needs to not use snapshots because of file differences
998
// and locations across users and CI systems
1099
describe(`create-file-node`, () => {
@@ -43,93 +132,90 @@ describe(`create-file-node`, () => {
43132
})
44133

45134
it(`creates a file node`, async () => {
46-
const createNodeId = jest.fn()
47-
createNodeId.mockReturnValue(`uuid-from-gatsby`)
135+
const createNodeId = createMockCreateNodeId()
136+
137+
const cache = createMockCache()
138+
48139
return createFileNode(
49140
path.resolve(`${__dirname}/fixtures/file.json`),
50141
createNodeId,
51-
{}
142+
{},
143+
cache
52144
)
53145
})
54146

55147
it(`records the shape of the node`, async () => {
56148
const dname = fs.mkdtempSync(`gatsby-create-file-node-test`).trim()
57149
try {
58150
const fname = path.join(dname, `f`)
59-
console.log(dname, fname)
60151
fs.writeFileSync(fname, `data`)
61152
try {
62-
const createNodeId = jest.fn()
63-
createNodeId.mockReturnValue(`uuid-from-gatsby`)
64-
65-
const node = await createFileNode(fname, createNodeId, {})
66-
67-
// Sanitize all filenames
68-
Object.keys(node).forEach(key => {
69-
if (typeof node[key] === `string`) {
70-
node[key] = node[key].replace(new RegExp(dname, `g`), `<DIR>`)
71-
node[key] = node[key].replace(new RegExp(fname, `g`), `<FILE>`)
72-
}
73-
})
74-
Object.keys(node.internal).forEach(key => {
75-
if (typeof node.internal[key] === `string`) {
76-
node.internal[key] = node.internal[key].replace(
77-
new RegExp(dname, `g`),
78-
`<DIR>`
79-
)
80-
node.internal[key] = node.internal[key].replace(
81-
new RegExp(fname, `g`),
82-
`<FILE>`
83-
)
84-
}
85-
})
86-
87-
// Note: this snapshot should update if the mock above is changed
88-
expect(node).toMatchInlineSnapshot(`
89-
Object {
90-
"absolutePath": "<DIR>/f",
91-
"accessTime": "1970-01-01T00:02:03.456Z",
92-
"atime": "1970-01-01T00:02:03.456Z",
93-
"atimeMs": 123456,
94-
"base": "f",
95-
"birthTime": "1970-01-01T00:02:03.456Z",
96-
"birthtime": "1970-01-01T00:02:03.456Z",
97-
"birthtimeMs": 123456,
98-
"blksize": 123456,
99-
"blocks": 123456,
100-
"changeTime": "1970-01-01T00:02:03.456Z",
101-
"children": Array [],
102-
"ctime": "1970-01-01T00:02:03.456Z",
103-
"ctimeMs": 123456,
104-
"dev": 123456,
105-
"dir": "<DIR>",
106-
"ext": "",
107-
"extension": "",
108-
"id": "uuid-from-gatsby",
109-
"ino": 123456,
110-
"internal": Object {
111-
"contentDigest": "8d777f385d3dfec8815d20f7496026dc",
112-
"description": "File \\"<DIR>/f\\"",
113-
"mediaType": "application/octet-stream",
114-
"type": "File",
115-
},
116-
"mode": 123456,
117-
"modifiedTime": "1970-01-01T00:02:03.456Z",
118-
"mtime": "1970-01-01T00:02:03.456Z",
119-
"mtimeMs": 123456,
120-
"name": "f",
121-
"nlink": 123456,
122-
"parent": null,
123-
"prettySize": "123 kB",
124-
"rdev": 123456,
125-
"relativeDirectory": "<DIR>",
126-
"relativePath": "<DIR>/f",
127-
"root": "",
128-
"size": 123456,
129-
"sourceInstanceName": "__PROGRAMMATIC__",
130-
"uid": 123456,
131-
}
132-
`)
153+
const createNodeId = createMockCreateNodeId()
154+
155+
const emptyCache = {
156+
get: jest.fn(),
157+
set: jest.fn(),
158+
directory: __dirname,
159+
}
160+
161+
const node = await createFileNode(fname, createNodeId, {}, emptyCache)
162+
163+
testNode(node, dname, fname, fileHash)
164+
} finally {
165+
fs.unlinkSync(fname)
166+
}
167+
} finally {
168+
fs.rmdirSync(dname)
169+
}
170+
})
171+
172+
it(`records the shape of the node from cache`, async () => {
173+
const dname = fs.mkdtempSync(`gatsby-create-file-node-test`).trim()
174+
try {
175+
const fname = path.join(dname, `f`)
176+
fs.writeFileSync(fname, `data`)
177+
try {
178+
const createNodeId = createMockCreateNodeId()
179+
180+
const getFromCache = jest.fn()
181+
getFromCache.mockReturnValue(fileHash)
182+
const cache = createMockCache(getFromCache)
183+
184+
const nodeFromCache = await createFileNode(
185+
fname,
186+
createNodeId,
187+
{},
188+
cache
189+
)
190+
191+
testNode(nodeFromCache, dname, fname, fileHash)
192+
} finally {
193+
fs.unlinkSync(fname)
194+
}
195+
} finally {
196+
fs.rmdirSync(dname)
197+
}
198+
})
199+
200+
it(`records the shape of the fast hashed node`, async () => {
201+
const dname = fs.mkdtempSync(`gatsby-create-file-node-test`).trim()
202+
try {
203+
const fname = path.join(dname, `f`)
204+
fs.writeFileSync(fname, `data`)
205+
try {
206+
const createNodeId = createMockCreateNodeId()
207+
const cache = createMockCache()
208+
209+
const nodeFastHash = await createFileNode(
210+
fname,
211+
createNodeId,
212+
{
213+
fastHash: true,
214+
},
215+
cache
216+
)
217+
218+
testNode(nodeFastHash, dname, fname, fileFastHash)
133219
} finally {
134220
fs.unlinkSync(fname)
135221
}

0 commit comments

Comments
 (0)