/* Objective: Since Windows 10 File Explorer search seems messed-up on my laptop (and computers of at least some others reporting online) since late 2019, try to make something for finding files/folders on my laptop while waiting for a fix! Following Apache POI library JARs added to the (NetBeans) project: poi-4.1.1 poi-ooxml-4.1.1 poi-ooxml-schemas-4.1.1 poi-scratchpad-4.1.1 xmlbeans-3.1.0 commons-compress-1.19 commons-math3-3.6.1 v8 17Feb2020 --Previously, if the user clicked the Find button again before an existing search had completed, possibly with a new search term, the first search finished and printed its results to the display area before the new search ran and also printed there. This is inefficient and confusing. In the new version, the first search is interrupted and the display area cleared first, though a single result from the previous search sometimes gets through (hope to fix this in future versions). Another caveat: though program seems to work well enough (as far as my limited testing has ascertained), the way this is done is probably messy (many exceptions thrownn, and an Error), not sure interrupt() is being used correctly; further work: probably could/should use Concurrency API instead of directly-coded Threads, and/or perhaps change to FileVisitor with Files.walkFileTree(...) in place of Files.find(...) for better control. Items that might be added/addressed later: see comments at bottom file */ import java.awt.BorderLayout; import java.awt.Color; import java.awt.Font; import java.awt.GridLayout; import java.awt.Insets; import java.io.FileInputStream; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.List; import java.util.stream.Stream; import javax.swing.BorderFactory; import javax.swing.JButton; import javax.swing.JFrame; import javax.swing.JLabel; import javax.swing.JPanel; import javax.swing.JScrollPane; import javax.swing.JTextArea; import javax.swing.border.EmptyBorder; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.usermodel.XWPFParagraph; class FindFileOrFolder_v8 { String missingStartMessage = ""; String missingTargetMessage = ""; int walkDepth = 1; // --specifies how many subdirectory levels to go down when collecting files to access // initialized to default 1 (do not look in subdirectories) boolean caseSensitive; // whether search term is treated as case-sensitive (default is no) boolean lookInside; // whether to search for target inside txt and cvs files also Thread searchThread = new Thread(); // --v8 added this, ref to allow arrangement for new instance of thread // that runs search code to interrupt any previous still-running instance // Declaring (and initialising) fields for GUI components... JLabel startFolderLabel = new JLabel("Enter the path of the folder from within which you want to (start your) search..."); JTextArea startFolderTextArea = new JTextArea(2, 60); // input to specify foolder within which to (start) searching JButton depthButton = new JButton("Search also in subfolders of the starting folder"); JButton withinFileButton = new JButton("Search also text within following file types: docx, doc, txt, cvs"); JPanel howDeepPanel = new JPanel(new GridLayout()); // to hold the depthButton and withinFileButton JPanel wherePanel = new JPanel(new BorderLayout()); // to hold startFolderLabel, startFolderTextArea & howDeepPanel JLabel targetNameLabel = new JLabel("Enter the name or partial name of a file or folder you want to find..."); JTextArea targetNameTextArea = new JTextArea(2, 60); // input to specify file/folder names for which to search JButton caseButton = new JButton("Make search case-sensitive"); JPanel whatPanel = new JPanel(new BorderLayout()); // targetNameLabel, targetNameTextArea & caseButton JPanel inputsPanel = new JPanel(new BorderLayout()); // to hold wherePanel & whatPanel JButton findButton = new JButton("Find files/folders"); JTextArea statusDisplayTextArea = new JTextArea(1, 50); // displays "in progress" vs "finished" JTextArea resultsDisplayTextArea = new JTextArea(40, 100); // displays output, i.e. paths for files/folders found JPanel findAndShowPanel = new JPanel(new BorderLayout()); // to hold findButton & resultsDisplayTextArea JFrame frame = new JFrame("FindFileOrFolder"); // to hold all above (sub)panels FindFileOrFolder_v8() // constructor, called when main method runs { startFolderLabel.setBorder(new EmptyBorder(10, 10, 0, 0)); startFolderTextArea.setLineWrap(true); startFolderTextArea.setMargin(new Insets(5, 10, 5, 10)); JScrollPane startSP = new JScrollPane(startFolderTextArea); startSP.setBorder(BorderFactory.createMatteBorder(2, 0, 0, 0, new Color(245, 245, 245))); depthButton.addActionListener(actionEvent -> { if (walkDepth == 1) { walkDepth = Integer.MAX_VALUE; depthButton.setText("Search in the starting folder only"); } else // (walkDepth is Integer.MAX_VALUE) { walkDepth = 1; depthButton.setText("Search in subfolders of the starting folder also"); } } ); // toggles walk dept between no-subfolders (starting state) and all-subfolders howDeepPanel.add(depthButton, BorderLayout.WEST); withinFileButton.addActionListener(actionEvent -> { if (lookInside == false) // (could rewrite as !lookInside) { lookInside = true; withinFileButton.setText("Revert to not searching also text within files (docx, doc, txt, cvs)"); } else // (lookInside is true) { lookInside = false; withinFileButton.setText("Revert to searching also text within following file types: docx, doc, txt, cvs"); } } ); // toggles between searching just file names and text within any txt/cvs files also howDeepPanel.add(withinFileButton, BorderLayout.EAST); wherePanel.setBackground(new Color(245, 245, 245)); wherePanel.add(startFolderLabel, BorderLayout.NORTH); wherePanel.add(startSP, BorderLayout.CENTER); wherePanel.add(howDeepPanel, BorderLayout.SOUTH); targetNameLabel.setBorder(new EmptyBorder(10, 10, 0, 0)); targetNameTextArea.setLineWrap(true); targetNameTextArea.setMargin(new Insets(5, 10, 5, 10)); JScrollPane targetNameSP = new JScrollPane(targetNameTextArea); targetNameSP.setBorder(BorderFactory.createMatteBorder(2, 0, 0, 0, new Color(245, 245, 245))); caseButton.addActionListener(actionEvent -> { if (caseSensitive == false) { caseSensitive = true; caseButton.setText("Make search case-insentitive again"); } else // (caseSensitive is true) { caseSensitive = false; caseButton.setText("Make search case-sentitive again"); } } ); // toggles search between case-insensitive (starting state) and case-sensitive whatPanel.setBackground(new Color(245, 245, 245)); whatPanel.add(targetNameLabel, BorderLayout.NORTH); whatPanel.add(targetNameSP, BorderLayout.CENTER); whatPanel.add(caseButton, BorderLayout.SOUTH); inputsPanel.add(wherePanel, BorderLayout.NORTH); inputsPanel.add(whatPanel, BorderLayout.SOUTH); findButton.addActionListener(actionEvent -> { // (the 'actionPerformed' method body of the implicit ActionListner...) searchThread.interrupt(); // --v8 added, to stop thread from any incomplete previous search [find better way?] // --v8 added, temporary console print to aid in troubleshooting // System.out.println("Thread ID " + searchThread.getId() + ", getState() returns " + searchThread.getState() // + ", isAlive() reports " + searchThread.isAlive() // + ", isInterrupted reports " + searchThread.isInterrupted()); resultsDisplayTextArea.setText(null); // --v8 added, to clear any previous text synchronized(findButton) // --v8 added, to prevent more than one search running at the same time [find better way?] { String startText = startFolderTextArea.getText().trim(); Path startPath = null; // path to folder within which to (start) searching boolean startPathValid = true; try { startPath = Paths.get(startText); // ...for some start inputs on this call } catch (Exception e) // to handle possible InvalidPathException { startPathValid = false; } Path validatedStartPath = startPath; // (need an effectively final variable for use later in a lambda) if (startPathValid) // even if real path input, want to check that it a folder (cf a file), so reassign to... { startPathValid = Files.isDirectory(startPath); // (as empty string arg seems to generate Path regarded // as valid (root/current folder?), so for now including check re startText.isEmpty() below) } String targetText = targetNameTextArea.getText(); // (partial) names of files/folders for which to search final String target = caseSensitive ? targetText : targetText.toLowerCase(); // if (startText.isEmpty() || !startPathValid) { missingStartMessage = "No valid start path supplied" + "\n"; } if (target.isEmpty()) { missingTargetMessage = "No search term supplied" + "\n"; } if (!startText.isEmpty() && startPathValid && !target.isEmpty()) { // Only run process below if target text and valid start path have been supplied (avoid wasteful processing) statusDisplayTextArea.setText("Search in progress..."); resultsDisplayTextArea.setForeground(Color.BLACK); resultsDisplayTextArea.setText(null); // clear any previous text before displaying the results searchThread = new Thread(() -> // (--v8 changed to referenced thread) { try { Stream targetStream = Files.find(validatedStartPath, walkDepth, (p,a) -> !isHiddenHandler(p) // to exclude hidden (temporary etc) files && ( toStringHandled(p.getFileName()).contains(target) // target in file/folder name... || (lookInside ? fileContainsTarget(p, target) : false) // or target within file text ) ); // targetStream.forEach(p -> resultsDisplayTextArea.append(p + "\n")); // print paths found in output/display area targetStream .peek(p -> resultsDisplayTextArea.append(p + "\n")) // print paths found in output/display area .allMatch(p -> !Thread.interrupted()); // --v8 change from printing in forEach to peek followed by the allMatch, // (at least in the context of the other changes being present also_ // reduces residual printing of results from interrupted search results among new search results /message to one // (Though if I in addition make the print in the peek conditional on !Thread.interrupted() // get ther reverse - EVERYTHING in first search gets printed anyway!!!???) // (Note also: putting the targetStream declaration-instantiation above into try-with-resources // did not seem to make any difference, though might be advisable anyway?) if (resultsDisplayTextArea.getText().equals("")) { resultsDisplayTextArea.setForeground(Color.BLUE); resultsDisplayTextArea.setText("No results found"); } // (seems inelegant, but have not thought of a way to do directly from the stream code yet) statusDisplayTextArea.setText("Search completed"); } catch (IOException ex) { if (ex.getClass().getName().equals("java.nio.file.NoSuchFileException")) { missingInputsMessage(); } // (probably not needed now, though, as should not get to try clause without valid start path) } }); searchThread.start(); } else { missingInputsMessage(); } } // end of synchronized block } // (...end of 'actionPerformed' method body of the implicit ActionListner...) ); findButton.setMargin(new Insets(10, 10, 10, 10)); findButton.setFont(new Font("SansSerif", Font.BOLD, 20)); findAndShowPanel.add(findButton, BorderLayout.NORTH); statusDisplayTextArea.setEditable(false); statusDisplayTextArea.setLineWrap(true); statusDisplayTextArea.setWrapStyleWord(true); statusDisplayTextArea.setMargin(new Insets(5, 5, 5, 5)); statusDisplayTextArea.setFont(new Font("SERIF", Font.BOLD, 20)); statusDisplayTextArea.setForeground(Color.GREEN); findAndShowPanel.add(new JScrollPane(statusDisplayTextArea), BorderLayout.CENTER); resultsDisplayTextArea.setEditable(false); resultsDisplayTextArea.setLineWrap(true); resultsDisplayTextArea.setWrapStyleWord(true); resultsDisplayTextArea.setMargin(new Insets(5, 5, 5, 5)); findAndShowPanel.add(new JScrollPane(resultsDisplayTextArea), BorderLayout.SOUTH); frame.add(inputsPanel, BorderLayout.NORTH); frame.add(findAndShowPanel, BorderLayout.SOUTH); frame.setResizable(false); // (buttons disappear if user drags frame bottom up, // while if it's dragged down, the extra space just appears as a gap between the panels, // so just hard-coding big resultsDisplayTextArea for the moment; // looking briefly online, see descriptions/code for how to make rezizable by dragging, but not trivial frame.setVisible(true); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); frame.pack(); // (may need to keep this positioned last) } void missingInputsMessage() // puts message in display area if one or both user inputs missing/incorrect { statusDisplayTextArea.setText(null); // --v8 added (realised should have been present in previous version(s)) resultsDisplayTextArea.setForeground(Color.red); resultsDisplayTextArea.setText(missingStartMessage + missingTargetMessage); missingStartMessage = ""; // reset for future clicks missingTargetMessage = ""; // (ditto) } boolean isHiddenHandler (Path p) // as Files.isHidden(...) throws checked exception... { // ...easier to handle it externally than in stream coded in findButton's addActionListener above boolean result = false; try { result = Files.isHidden(p); } catch (IOException ex) { System.out.println("From isHiddenHandler: " + ex); } return result; } String toStringHandled (Path p) // as Path's toString() can throw NullPointerException... { // ...easier to handle it externally than in stream coded in findButton's addActionListener above String result = ""; try { result = p.toString(); // p is filename for start path, and is null if that is root, e.g. C:\ } catch (NullPointerException npEx) // ...in which case NullPointerException is thrown { resultsDisplayTextArea.setText("Note: As you are starting from the root, " + "the search may terminate early if subfolders without access permission are encountered. " + "(This is a known issue, and may also affect searches starting further down the hierarchy, " + "in which case you will not see feedback, unfortunately)." + " May be addressed in a subsequent version of the program." + "\n\n"); } if (!caseSensitive) // if user has chosen case-sensitive option, this does not happen... { result = result.toLowerCase(); // ...search term made all-lowercase } return result; } boolean fileContainsTarget (Path p, String target) { String fileName = (p.getFileName().toString().toLowerCase()); if (Files.isReadable(p)) // (have not found I actually need this isReadable check, but testing has been limited) { if ( ( fileName.endsWith(".txt") || // defining file types in which to search... fileName.endsWith(".csv") // ...and could add any other 'UTF text' types if there are any ) ) { return txtORcsvHasTrget(p, target); } else if (fileName.endsWith(".docx")) { return docxHasTarget(p, target); } else if (fileName.endsWith(".doc")) { return docHasTarget(p, target); } } // (or could parse the file extension as a string and use a switch statement instead) return false; } boolean txtORcsvHasTrget (Path p, String target) { Stream linesFromFile = Stream.empty(); try { linesFromFile = Files.lines(p, StandardCharsets.ISO_8859_1); // Note for future reference: adding second arg StandardCharsets.ISO_8859_1 may avoid // MalformedInputException being thrown for non-UTF-encoded files, // e.g. docx, xlsx, pdf, which I am not including here as only 'gibberish' symbols are displayed of course if (!caseSensitive) { linesFromFile = linesFromFile.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return linesFromFile.anyMatch(s -> s.contains(target)); // General note(s): As anyMatch(...) will return as soon as a match is found (if any) // it will not waste resources/time processing subsequent file lines. // Order in which lines is processed not important, // so might investigate later if making stream parallel improves speed } catch (IOException ex) { System.out.println("From txtORcsvHasTrget: " + ex); } return false; } boolean docxHasTarget(Path p, String target) { try { FileInputStream fIS = new FileInputStream(p.toFile()); // (Is there a more modern API I could use here with Apache POI?) XWPFDocument docx = new XWPFDocument(fIS); List paragraphList = docx.getParagraphs(); // (Would be nice to generate a stream rather than // read everyting into memory before processing, but there does not seem to be a methof for that in current Apace POI?) // May be a way to do for Excel files when I address them later? http://poi.apache.org/components/spreadsheet/limitations.html Stream paragraphStringStream = paragraphList.stream().map(para -> para.getText()); if (!caseSensitive) { paragraphStringStream = paragraphStringStream.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return paragraphStringStream.anyMatch(s -> s.contains(target)); } catch (IOException ex) { System.out.println(ex); } catch (org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException ex) // See Note2 at bottom file { // System.out.println("org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException from docxHasTarget(...) method, " // + "processing file " + p + " ...ignore and continue."); // Keeping disabled unless needed for troubleshooting as these prints are visible to end-user in GUI // and can slow the search if many 'problem' .docx files are encountered } return false; } // --v7 added... boolean docHasTarget(Path p, String target) { try { FileInputStream fIS = new FileInputStream(p.toFile()); HWPFDocument doc = new HWPFDocument(fIS); WordExtractor extractor = new WordExtractor(doc); String[] paragraphStrings = extractor.getParagraphText(); // (getText() would extract all text as single String) Stream paragraphStringStream = Arrays.stream(paragraphStrings); if (!caseSensitive) { paragraphStringStream = paragraphStringStream.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return paragraphStringStream.anyMatch(s -> s.contains(target)); } catch (IOException ex) { System.out.println("From docHasTarget: " + ex); } catch (IllegalArgumentException ex) // See Note1 at bottom file { // System.out.println("IllegalArgumentException from docHasTarget(...) method ," // + "processing file " + p + ", probably because" // + " a non-doc file with a mis-applied doc extension was encountered...ignore and continue."); // Keeping disabled unless needed for troubleshooting as these prints are visible to end-user in GUI // and can slow the search if many 'problem' .doc files are encountered } return false; } public static void main(String[] args) { new FindFileOrFolder_v8(); } } /* Items that might be added/addressed later: --Known issue: If the user clicks the Find button again before a previous search has completed, possibly with a new search term, although the previous search is discarded now (good), a single result sometimes gets through to the new results list. Another caveat re v8 code: though program seems to work well enough (as far as my limited testing has ascertained), the way this is done is probably messy (many exceptions thrownn, and an Error), not sure interrupt() is being used correctly. Further work: probably could/should use Concurrency API instead of directly-coded Threads, and/or perhaps change to FileVisitor with Files.walkFileTree(...) in place of Files.find(...) for better control. --Attempt to allow a search to be truncated without exiting the program or starting a new search, ideally retaining results found to that point in the display. --Maybe try to address known issue that searches with search-in-subfolders enabled from some start paths near the root, e.g. C:\Users, and even C:\Users\[my user name]\Documents on my system, terminate prematurely. (However, though the user does not receive feedback and may not realise that not all files which should be found are, the program does not crash and will respond normally to a 'regular' subsequent search.) Cause = AccessDeniedException thrown due to denial of access to some folders. Might not be able to handle this from Files.find(...), as used currently, or Files.walk(...), so perhaps try to instead use walkFileTree + FileVisitor. --User settings provided by the upper buttons (all but the Find button) could instead be provided by dropdowns, radio buttons or checkboxes (+see * below), to make current settings more immediately obvious. Checkboxes would be especially suitable as it would be best to allow user to choose which combination file types to include. --Maybe try to address known issue that searches with search-within-files enabled may be slowed if 'inauthentic' files having .doc(/.docx) extensions are encountered. See comments in Note1 and Note2 below. Way to determine actual file type rather than just parsing extension before sending to Apache POI code? Perhaps use walkFileTree + FileVisitor instead of Files.find(...) to skip _vti_cnf folders? *Check boxes might be especially useful to allow allow user choose serch-within only some of the supported file types --Known issue: Sometimes a given search takes ~10x or more longer than it does if executed subsequently once or repeatedly (on my system). Also, the GUI may take a long time to open sometimes. Try to find out why and address. --Test whether making any of the Streams parallel (where possible, i.e. without loss of any beneficial output order etc) improves speed (on my system, at least); especially relevant if looking within files. --Maybe address any other notes left in comments re possible reconfigurations. */ /* Other notes: --Note1: Searching in one of my large folders (note to self - “Archives”) with search-in-subfolders & within-files enabled, got “java.lang.IllegalArgumentException: The document is really a UNKNOWN file” ...coming from HWPFDocument via invoction of my docHasTarget(...) method. Looking online,this may be applicanble: https://stackoverflow.com/questions/4996954/error-in-displaying-a-doc-file-reading-that-from-a-document-on-console-in-java comments “That's a typical message of an IllegalArgumentException from the HWPFDocument constructor. To the point it means that the supplied file is actually a (Wordpad) RTF file whose .rtf extension has incorrectly been renamed to .doc.” On handling the exception, I note that many of my 'problem' files have the .docx extension and open with MS word but are within _vti_cnf folders that I inadvertantly made in my archive folders years ago, and/or were transferred from an Apple Mac years ago, sometimes involving RESOURCE.FRK entity. Are some files wrongly flagged as not .doc? How much are things slowed up? --Note2: I assume the org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException that I see thrown (albeit much less frequently from the files in the folder mentioned in Note1) from docxHasTarget(...) is somewhat analogous to that of Note1. Also from _vti_cnf folders, as per Note1 above. */