/* Objective: Since Windows 10 File Explorer search seems messed-up on my laptop (and computers of at least some others reporting online) since late 2019, try to make something for finding files/folders on my laptop while waiting for a fix! v7_z (version 7, final iteration to something with new features working well enough to upload executable) 10Feb2020 --Using Apache POI to include doc & docx to file types that can be searched. Following library JARs added to the (NetBeans) project: poi-4.1.1 poi-ooxml-4.1.1 poi-ooxml-schemas-4.1.1 poi-scratchpad-4.1.1 xmlbeans-3.1.0 commons-compress-1.19 commons-math3-3.6.1 --Added a small extra display area, statusDisplayTextArea, above area used to print the search results to show the search status ("In progress", then "Finished"). In conjunction with use of regular new thread in place of SwingUtilities.invokeLater(...), this allows the status to be shown independenly, and individual results to be shown immediately (instead of all at once when search finished). The new arrangement also removes a problem noticed with longer runs associated with addition of within-file search, that user could not shut down the program if taking too long - at least now the windows close button is responsive. (Hope to add facility to stop a search without exiting lter.) Items that might be added/addressed later: see comments at bottom file */ import java.awt.BorderLayout; import java.awt.Color; import java.awt.Font; import java.awt.GridLayout; import java.awt.Insets; import java.io.FileInputStream; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.List; import java.util.stream.Stream; import javax.swing.BorderFactory; import javax.swing.JButton; import javax.swing.JFrame; import javax.swing.JLabel; import javax.swing.JPanel; import javax.swing.JScrollPane; import javax.swing.JTextArea; import javax.swing.border.EmptyBorder; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.usermodel.XWPFParagraph; class FindFileOrFolder_v7_z { String missingStartMessage = ""; String missingTargetMessage = ""; int walkDepth = 1; // --specifies how many subdirectory levels to go down when collecting files to access // initialized to default 1 (do not look in subdirectories) boolean caseSensitive; // whether search term is treated as case-sensitive (default is no) boolean lookInside; // whether to search for target inside txt and cvs files also // Declaring fields for GUI components... JLabel startFolderLabel = new JLabel("Enter the path of the folder from within which you want to (start your) search..."); JTextArea startFolderTextArea = new JTextArea(2, 60); // input to specify foolder within which to (start) searching JButton depthButton = new JButton("Search also in subfolders of the starting folder"); JButton withinFileButton = new JButton("Search also text within following file types: docx, doc, txt, cvs"); // --v7 added "docx, doc" JPanel howDeepPanel = new JPanel(new GridLayout()); // to hold the depthButton and withinFileButton JPanel wherePanel = new JPanel(new BorderLayout()); // to hold startFolderLabel, startFolderTextArea & howDeepPanel JLabel targetNameLabel = new JLabel("Enter the name or partial name of a file or folder you want to find..."); JTextArea targetNameTextArea = new JTextArea(2, 60); // input to specify file/folder names for which to search JButton caseButton = new JButton("Make search case-sensitive"); JPanel whatPanel = new JPanel(new BorderLayout()); // targetNameLabel, targetNameTextArea & caseButton JPanel inputsPanel = new JPanel(new BorderLayout()); // to hold wherePanel & whatPanel JButton findButton = new JButton("Find files/folders"); JTextArea statusDisplayTextArea = new JTextArea(1, 50); // displays "in progress" vs "finished" --v7 added JTextArea resultsDisplayTextArea = new JTextArea(40, 100); // displays output, i.e. paths for files/folders found JPanel findAndShowPanel = new JPanel(new BorderLayout()); // to hold findButton & resultsDisplayTextArea JFrame frame = new JFrame("FindFileOrFolder"); // to hold all above (sub)panels FindFileOrFolder_v7_z() // constructor, called when main method runs { startFolderLabel.setBorder(new EmptyBorder(10, 10, 0, 0)); startFolderTextArea.setLineWrap(true); startFolderTextArea.setMargin(new Insets(5, 10, 5, 10)); JScrollPane startSP = new JScrollPane(startFolderTextArea); startSP.setBorder(BorderFactory.createMatteBorder(2, 0, 0, 0, new Color(245, 245, 245))); depthButton.addActionListener(actionEvent -> { if (walkDepth == 1) { walkDepth = Integer.MAX_VALUE; depthButton.setText("Search in the starting folder only"); } else // (walkDepth is Integer.MAX_VALUE) { walkDepth = 1; depthButton.setText("Search in subfolders of the starting folder also"); } } ); // toggles walk dept between no-subfolders (starting state) and all-subfolders howDeepPanel.add(depthButton, BorderLayout.WEST); withinFileButton.addActionListener(actionEvent -> { if (lookInside == false) // (could rewrite as !lookInside) { lookInside = true; withinFileButton.setText("Revert to not searching also text within files (docx, doc, txt, cvs)"); // --v7 text updared } else // (lookInside is true) { lookInside = false; withinFileButton.setText("Revert to searching also text within following file types: docx, doc, txt, cvs"); // --v7 text updared } } ); // toggles between searching just file names and text within any txt/cvs files also howDeepPanel.add(withinFileButton, BorderLayout.EAST); wherePanel.setBackground(new Color(245, 245, 245)); wherePanel.add(startFolderLabel, BorderLayout.NORTH); wherePanel.add(startSP, BorderLayout.CENTER); wherePanel.add(howDeepPanel, BorderLayout.SOUTH); targetNameLabel.setBorder(new EmptyBorder(10, 10, 0, 0)); targetNameTextArea.setLineWrap(true); targetNameTextArea.setMargin(new Insets(5, 10, 5, 10)); JScrollPane targetNameSP = new JScrollPane(targetNameTextArea); targetNameSP.setBorder(BorderFactory.createMatteBorder(2, 0, 0, 0, new Color(245, 245, 245))); caseButton.addActionListener(actionEvent -> { if (caseSensitive == false) { caseSensitive = true; caseButton.setText("Make search case-insentitive again"); } else // (caseSensitive is true) { caseSensitive = false; caseButton.setText("Make search case-sentitive again"); } } ); // toggles search between case-insensitive (starting state) and case-sensitive whatPanel.setBackground(new Color(245, 245, 245)); whatPanel.add(targetNameLabel, BorderLayout.NORTH); whatPanel.add(targetNameSP, BorderLayout.CENTER); whatPanel.add(caseButton, BorderLayout.SOUTH); inputsPanel.add(wherePanel, BorderLayout.NORTH); inputsPanel.add(whatPanel, BorderLayout.SOUTH); findButton.addActionListener(actionEvent -> { // (the 'actionPerformed' method body of the implicit ActionListner...) String startText = startFolderTextArea.getText().trim(); Path startPath = null; // path to folder within which to (start) searching boolean startPathValid = true; try { startPath = Paths.get(startText); // ...for some start inputs on this call } catch (Exception e) // to handle possible InvalidPathException { startPathValid = false; } Path validatedStartPath = startPath; // (need an effectively final variable for use later in a lambda) if (startPathValid) // even if real path input, want to check that it a folder (cf a file), so reassign to... { startPathValid = Files.isDirectory(startPath); // (as empty string arg seems to generate Path regarded // as valid (root/current folder?), so for now including check re startText.isEmpty() below) } String targetText = targetNameTextArea.getText(); // (partial) names of files/folders for which to search final String target = caseSensitive ? targetText : targetText.toLowerCase(); // if (startText.isEmpty() || !startPathValid) { missingStartMessage = "No valid start path supplied" + "\n"; } if (target.isEmpty()) { missingTargetMessage = "No search term supplied" + "\n"; } if (!startText.isEmpty() && startPathValid && !target.isEmpty()) { // Only run process below if target text and valid start path have been supplied (avoid wasteful processing) statusDisplayTextArea.setText("Search in progress..."); // --v7 directed this message to new display area resultsDisplayTextArea.setForeground(Color.BLACK); resultsDisplayTextArea.setText(null); // clear any previous text before displaying the results // --moved from position at start runnable body below, though position probably does not affect performance // New thread wrap, cf SwingUtilities.invokeLater use, or nothing, allows proper results display // area clearance between searches, and use of window close button to exit if needed new Thread(() -> { try { Stream targetStream = Files.find(validatedStartPath, walkDepth, (p,a) -> !isHiddenHandler(p) // to exclude hidden (temporary etc) files && ( toStringHandled(p.getFileName()).contains(target) // target in file/folder name... || (lookInside ? fileContainsTarget(p, target) : false) // or target within file text ) ); targetStream.forEach(p -> resultsDisplayTextArea.append(p + "\n")); // print paths found in output/display area if (resultsDisplayTextArea.getText().equals("")) { resultsDisplayTextArea.setForeground(Color.BLUE); resultsDisplayTextArea.setText("No results found"); } // (seems inelegant, but have not thought of a way to do directly from the stream code yet) statusDisplayTextArea.setText("Search completed"); // --v7 added } catch (IOException ex) { if (ex.getClass().getName().equals("java.nio.file.NoSuchFileException")) { missingInputsMessage(); } // (probably not needed now, though, as should not get to try clause without valid start path) } }).start(); } else { missingInputsMessage(); } } ); findButton.setMargin(new Insets(10, 10, 10, 10)); findButton.setFont(new Font("SansSerif", Font.BOLD, 20)); findAndShowPanel.add(findButton, BorderLayout.NORTH); // --v7 added statusDisplayTextArea.setEditable(false); statusDisplayTextArea.setLineWrap(true); statusDisplayTextArea.setWrapStyleWord(true); statusDisplayTextArea.setMargin(new Insets(5, 5, 5, 5)); statusDisplayTextArea.setFont(new Font("SERIF", Font.BOLD, 20)); statusDisplayTextArea.setForeground(Color.GREEN); findAndShowPanel.add(new JScrollPane(statusDisplayTextArea), BorderLayout.CENTER); resultsDisplayTextArea.setEditable(false); resultsDisplayTextArea.setLineWrap(true); resultsDisplayTextArea.setWrapStyleWord(true); resultsDisplayTextArea.setMargin(new Insets(5, 5, 5, 5)); findAndShowPanel.add(new JScrollPane(resultsDisplayTextArea), BorderLayout.SOUTH); frame.add(inputsPanel, BorderLayout.NORTH); frame.add(findAndShowPanel, BorderLayout.SOUTH); frame.setResizable(false); // (buttons disappear if user drags frame bottom up, // while if it's dragged down, the extra space just appears as a gap between the panels, // so just hard-coding big resultsDisplayTextArea for the moment; // looking briefly online, see descriptions/code for how to make rezizable by dragging, but not trivial frame.setVisible(true); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); frame.pack(); // (may need to keep this positioned last) } void missingInputsMessage() // puts message in display area if one or both user inputs missing/incorrect { resultsDisplayTextArea.setForeground(Color.red); resultsDisplayTextArea.setText(missingStartMessage + missingTargetMessage); missingStartMessage = ""; // reset for future clicks missingTargetMessage = ""; // (ditto) } boolean isHiddenHandler (Path p) // as Files.isHidden(...) throws checked exception... { // ...easier to handle it externally than in stream coded in findButton's addActionListener above boolean result = false; try { result = Files.isHidden(p); } catch (IOException ex) { System.out.println(ex); } return result; } String toStringHandled (Path p) // as Path's toString() can throw NullPointerException... { // ...easier to handle it externally than in stream coded in findButton's addActionListener above String result = ""; try { result = p.toString(); // p is filename for start path, and is null if that is root, e.g. C:\ } catch (NullPointerException npEx) // ...in which case NullPointerException is thrown { resultsDisplayTextArea.setText("Note: As you are starting from the root, " + "the search may terminate early if subfolders without access permission are encountered. " + "(This is a known issue, and may also affect searches starting further down the hierarchy, " + "in which case you will not see feedback, unfortunately)." + " May be addressed in a subsequent version of the program." + "\n\n"); } if (!caseSensitive) // if user has chosen case-sensitive option, this does not happen... { result = result.toLowerCase(); // ...search term made all-lowercase } return result; } // --v7 altered to delegate check to new method txtORcsvHasTrget(...) if file is txt or cvs... // (also removed un-needed targetFound variable) boolean fileContainsTarget (Path p, String target) { String fileName = (p.getFileName().toString().toLowerCase()); if (Files.isReadable(p)) // (have not found I actually need this isReadable check, but testing has been limited) { if ( ( fileName.endsWith(".txt") || // defining file types in which to search... fileName.endsWith(".csv") // ...and could add any other 'UTF text' types if there are any ) ) { return txtORcsvHasTrget(p, target); } else if (fileName.endsWith(".docx")) { return docxHasTarget(p, target); } else if (fileName.endsWith(".doc")) { return docHasTarget(p, target); } } // (or could parse the file extension as a string and use a switch statement instead) return false; } // --v7 added (mod from old fileContainsTarget(...) method)... boolean txtORcsvHasTrget (Path p, String target) { Stream linesFromFile = Stream.empty(); try { linesFromFile = Files.lines(p, StandardCharsets.ISO_8859_1); // Note for future reference: adding second arg StandardCharsets.ISO_8859_1 may avoid // MalformedInputException being thrown for non-UTF-encoded files, // e.g. docx, xlsx, pdf, which I am not including here as only 'gibberish' symbols are displayed of course if (!caseSensitive) { linesFromFile = linesFromFile.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return linesFromFile.anyMatch(s -> s.contains(target)); // General note(s): As anyMatch(...) will return as soon as a match is found (if any) // it will not waste resources/time processing subsequent file lines. // Order in which lines is processed not important, // so might investigate later if making stream parallel improves speed } catch (IOException ex) { System.out.println(ex); } return false; } // --v7 added... boolean docxHasTarget(Path p, String target) { try { FileInputStream fIS = new FileInputStream(p.toFile()); // (Is there a more modern API I could use here with Apache POI?) XWPFDocument docx = new XWPFDocument(fIS); List paragraphList = docx.getParagraphs(); // (Would be nice to generate a stream rather than // read everyting into memory before processing, but there does not seem to be a methof for that in current Apace POI?) // May be a way to do for Excel files when I address them later? http://poi.apache.org/components/spreadsheet/limitations.html Stream paragraphStringStream = paragraphList.stream().map(para -> para.getText()); if (!caseSensitive) { paragraphStringStream = paragraphStringStream.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return paragraphStringStream.anyMatch(s -> s.contains(target)); } catch (IOException ex) { System.out.println(ex); } catch (org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException ex) // See Note2 at bottom file { // System.out.println("org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException from docxHasTarget(...) method, " // + "processing file " + p + " ...ignore and continue."); // Keeping disabled unless needed for troubleshooting as these prints are visible to end-user in GUI // and can slow the search if many 'problem' .docx files are encountered } return false; } // --v7 added... boolean docHasTarget(Path p, String target) { try { FileInputStream fIS = new FileInputStream(p.toFile()); HWPFDocument doc = new HWPFDocument(fIS); WordExtractor extractor = new WordExtractor(doc); String[] paragraphStrings = extractor.getParagraphText(); // (getText() would extract all text as single String) Stream paragraphStringStream = Arrays.stream(paragraphStrings); if (!caseSensitive) { paragraphStringStream = paragraphStringStream.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return paragraphStringStream.anyMatch(s -> s.contains(target)); } catch (IOException ex) { System.out.println(ex); } catch (IllegalArgumentException ex) // See Note1 at bottom file { // System.out.println("IllegalArgumentException from docHasTarget(...) method ," // + "processing file " + p + ", probably because" // + " a non-doc file with a mis-applied doc extension was encountered...ignore and continue."); // Keeping disabled unless needed for troubleshooting as these prints are visible to end-user in GUI // and can slow the search if many 'problem' .doc files are encountered } return false; } public static void main(String[] args) { new FindFileOrFolder_v7_z(); } } /* Items that might be added/addressed later: --Attempt to allow a search to be truncated without exiting the program, ideally retaining results found to that point in the display; also disallow a new search until current one completes or is stopped, or at least arrange for the new search to discard results of first in display window. (Maybe investigate... Use of thread interrupt()? Use of a keepGoing boolean, printing results via peek rather than forEach, using the keepGoing as arg to a terminal allMatch to terminate stream? SwingWorker?) --Maybe try to address known issue that searches with search-in-subfolders enabled from some start paths near the root, e.g. C:\Users, and even C:\Users\[my user name]\Documents on my system, terminate prematurely. (However, though the user does not receive feedback and may not realise that not all files which should be found are, the program does not crash and will respond normally to a 'regular' subsequent search.) Cause = AccessDeniedException thrown due to denial of access to some folders. Might not be able to handle this from Files.find(...), as used currently, or Files.walk(...), so perhaps try to instead use walkFileTree + FileVisitor. --User settings provided by the upper buttons (all but the Find button) could instead be provided by dropdowns, radio buttons or checkboxes (+see * below), to make current settings more immediately obvious. Checkboxes would be especially suitable as it would be best to allow user to choose which combination file types to include. --Maybe try to address known issue that searches with search-within-files enabled may be slowed if 'inauthentic' files having .doc(/.docx) extensions are encountered. See comments in Note1 and Note2 below. Way to determine actual file type rather than just parsing extension before sending to Apache POI code? Perhaps use walkFileTree + FileVisitor instead of Files.find(...) to skip _vti_cnf folders? *Check boxes might be especially useful to allow allow user choose serch-within only some of the supported file types --Known issue: Sometimes a given search takes ~10x or more longer than it does if executed subsequently once or repeatedly (on my system). Also, the GUI may take a long time to open sometimes. Try to find out why and address. --Test whether making any of the Streams parallel (where possible, i.e. without loss of any beneficial out put order etc) improves speed (on my system, at least); especially relevant if looking within files. --Maybe address any other notes left in comments re possible reconfigurations. */ /* Other notes: --Note1: Searching in one of my large folders (note to self - “Archives”) with search-in-subfolders & within-files enabled, got “java.lang.IllegalArgumentException: The document is really a UNKNOWN file” ...coming from HWPFDocument via invoction of my docHasTarget(...) method. Looking online,this may be applicanble: https://stackoverflow.com/questions/4996954/error-in-displaying-a-doc-file-reading-that-from-a-document-on-console-in-java comments “That's a typical message of an IllegalArgumentException from the HWPFDocument constructor. To the point it means that the supplied file is actually a (Wordpad) RTF file whose .rtf extension has incorrectly been renamed to .doc.” On handling the exception, I note that many of my 'problem' files have the .docx extension and open with MS word but are within _vti_cnf folders that I inadvertantly made in my archive folders years ago, and/or were transferred from an Apple Mac years ago, sometimes involving RESOURCE.FRK entity. Are some files wrongly flagged as not .doc? How much are things slowed up? --Note2: I assume the org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException that I see thrown (albeit much less frequently from the files in the folder mentioned in Note1) from docxHasTarget(...) is somewhat analogous to that of Note1. Also from _vti_cnf folders, as per Note1 above. */