/* Objective: Since Windows 10 File Explorer search seems messed-up on my laptop (and computers of at least some others reporting online) since late 2019, try to make something for finding files/folders on my laptop while waiting for a fix! Following Apache POI library JARs added to the (NetBeans) project: poi-4.1.1 poi-ooxml-4.1.1 poi-ooxml-schemas-4.1.1 poi-scratchpad-4.1.1 xmlbeans-3.1.0 commons-compress-1.19 commons-math3-3.6.1 commons-collections4-4.4 v11_b2 10Mar2020 Not keeping the changes tried here as they do not seem to improve performance, just archiving the code as a partial record of tweaks tried. --Changed from Files.find(...) to Files.walk(...) followed by filter(...) on the returned stream so that I can test effect of making stream parallel. (Know from v7_i1 that the split into walk-filter per se does not affect performance, as not unexpected) Testing (crudely)...as for v_b1, little or none (see folder "v11_b2 testing") Items that might be added/addressed later: see comments at bottom file */ import java.awt.BorderLayout; import java.awt.Color; import java.awt.Font; import java.awt.GridLayout; import java.awt.Insets; import java.io.FileInputStream; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; import java.util.stream.Stream; import javax.swing.BorderFactory; import javax.swing.JButton; import javax.swing.JFrame; import javax.swing.JLabel; import javax.swing.JPanel; import javax.swing.JScrollPane; import javax.swing.JTextArea; import javax.swing.border.EmptyBorder; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.DataFormatter; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.usermodel.XWPFParagraph; class FindFileOrFolder_v11_b2 { String missingStartMessage = ""; String missingTargetMessage = ""; int walkDepth = Integer.MAX_VALUE; // Specifies how many subdirectory levels to go down boolean caseSensitive; // Whether search term is treated as case-sensitive (default is no) boolean lookInside; // Whether to search for target inside txt and cvs files also ExecutorService searchService = Executors.newSingleThreadExecutor(); // Declaring (and initialising) fields for GUI components... JLabel startFolderLabel = new JLabel("Enter the path of the folder from within which you want to (start your) search..."); JTextArea startFolderTextArea = new JTextArea(2, 60); // Input to specify foolder within which to (start) searching JButton depthButton = new JButton("Search only in starting folder (do not include subfolders)"); JButton withinFileButton = new JButton("Search also text within following file types: txt, doc, docx, cvs, xls, xlsx"); JPanel howDeepPanel = new JPanel(new GridLayout()); // To hold the depthButton and withinFileButton JPanel wherePanel = new JPanel(new BorderLayout()); // To hold startFolderLabel, startFolderTextArea & howDeepPanel JLabel targetNameLabel = new JLabel("Enter search term (name or partial name of files or folders or text within the files)..."); JTextArea targetNameTextArea = new JTextArea(2, 60); // Input to specify file/folder names for which to search JButton caseButton = new JButton("Make search case-sensitive"); JPanel whatPanel = new JPanel(new BorderLayout()); // targetNameLabel, targetNameTextArea & caseButton JPanel inputsPanel = new JPanel(new BorderLayout()); // To hold wherePanel & whatPanel JButton findButton = new JButton("Find files/folders"); JTextArea statusDisplayTextArea = new JTextArea(1, 50); // Displays "in progress" vs "finished" JTextArea resultsDisplayTextArea = new JTextArea(40, 100); // Displays output, i.e. paths for files/folders found JPanel findAndShowPanel = new JPanel(new BorderLayout()); // To hold findButton & resultsDisplayTextArea JFrame frame = new JFrame("FindFileOrFolder"); // to hold all above (sub)panels FindFileOrFolder_v11_b2() // Constructor, called when main method runs { startFolderLabel.setBorder(new EmptyBorder(10, 10, 0, 0)); startFolderTextArea.setLineWrap(true); startFolderTextArea.setMargin(new Insets(5, 10, 5, 10)); JScrollPane startSP = new JScrollPane(startFolderTextArea); startSP.setBorder(BorderFactory.createMatteBorder(2, 0, 0, 0, new Color(245, 245, 245))); depthButton.addActionListener(actionEvent -> { if (walkDepth == Integer.MAX_VALUE) { walkDepth = 1; depthButton.setText("Revert to searching in subfolders also"); } else // (walkDepth is 1) { walkDepth = Integer.MAX_VALUE; depthButton.setText("Revert to searching in starting folder only"); } } ); // Toggles walk dept between no-subfolders (starting state) and all-subfolders howDeepPanel.add(depthButton, BorderLayout.WEST); withinFileButton.addActionListener(actionEvent -> { if (lookInside == false) // (Could rewrite as !lookInside) { lookInside = true; withinFileButton.setText("Revert to not searching also text within following file types (txt, doc, docx, cvs, xls, xlsx)"); } else // (lookInside is true) { lookInside = false; withinFileButton.setText("Revert to searching also text within following file types: txt, doc, docx, cvs, xls, xlsx"); } } ); // Toggles between searching just file names and text within any txt/cvs files also howDeepPanel.add(withinFileButton, BorderLayout.EAST); wherePanel.setBackground(new Color(245, 245, 245)); wherePanel.add(startFolderLabel, BorderLayout.NORTH); wherePanel.add(startSP, BorderLayout.CENTER); wherePanel.add(howDeepPanel, BorderLayout.SOUTH); targetNameLabel.setBorder(new EmptyBorder(10, 10, 0, 0)); targetNameTextArea.setLineWrap(true); targetNameTextArea.setMargin(new Insets(5, 10, 5, 10)); JScrollPane targetNameSP = new JScrollPane(targetNameTextArea); targetNameSP.setBorder(BorderFactory.createMatteBorder(2, 0, 0, 0, new Color(245, 245, 245))); caseButton.addActionListener(actionEvent -> { if (caseSensitive == false) { caseSensitive = true; caseButton.setText("Make search case-insentitive again"); } else // (caseSensitive is true) { caseSensitive = false; caseButton.setText("Make search case-sentitive again"); } } ); // Toggles search between case-insensitive (starting state) and case-sensitive whatPanel.setBackground(new Color(245, 245, 245)); whatPanel.add(targetNameLabel, BorderLayout.NORTH); whatPanel.add(targetNameSP, BorderLayout.CENTER); whatPanel.add(caseButton, BorderLayout.SOUTH); inputsPanel.add(wherePanel, BorderLayout.NORTH); inputsPanel.add(whatPanel, BorderLayout.SOUTH); findButton.addActionListener(actionEvent -> { // (The 'actionPerformed' method body of the implicit ActionListner...) searchService.shutdownNow(); // Interrupt the previous search thread if still going try {searchService.awaitTermination(1000, TimeUnit.MILLISECONDS);} catch (InterruptedException ex){System.out.println(ex);} // Delay next statement (clearing of resultsDisplayTextArea) until previous thread stops resultsDisplayTextArea.setText(null); // Clear any previous text String startText = startFolderTextArea.getText().trim(); Path startPath = null; // Path to folder within which to (start) searching boolean startPathValid = true; try { startPath = Paths.get(startText); // ...for some start inputs on this call (?what did I mean?) } catch (Exception e) // To handle possible InvalidPathException { startPathValid = false; } Path validatedStartPath = startPath; // (Need an effectively final variable for use later in a lambda) if (startPathValid) // Even if real path input, want to check that it a folder (cf a file), so reassign to... { startPathValid = Files.isDirectory(startPath); // (as empty string arg seems to generate Path regarded // as valid (root/current folder?), so for now including check re startText.isEmpty() below) } String targetText = targetNameTextArea.getText(); // (Partial) names of files/folders for which to search final String target = caseSensitive ? targetText : targetText.toLowerCase(); if (startText.isEmpty() || !startPathValid) { missingStartMessage = "No valid start path supplied" + "\n"; } if (target.isEmpty()) { missingTargetMessage = "No search term supplied" + "\n"; } if (!startText.isEmpty() && startPathValid && !target.isEmpty()) { // Only run process below if target text and valid start path have been supplied (avoid wasteful processing) statusDisplayTextArea.setText("Search in progress..."); resultsDisplayTextArea.setForeground(Color.BLACK); resultsDisplayTextArea.setText(null); // Clear any previous text before displaying the results long startSearch = System.currentTimeMillis(); // --temporary---------------------------------------------------- // ...for testing - to start crude measurement of time to execute search searchService = Executors.newSingleThreadExecutor(); searchService.execute(()-> { // --changing from Files.find(...) to walk(...) followed by filter(...)----------------------------------------- // try (Stream targetStream = Files.walk(validatedStartPath, walkDepth);) // ...or the parallel version... try (Stream targetStream = Files.walk(validatedStartPath, walkDepth).parallel();) // -------------------- { targetStream .filter(p -> searchService.isShutdown() || // Naughty workaround to make sure that even if no hits found yet a path is sent // to the terminal method which in this case (allMatch(...) below) is set up to // end the stream if search thread has been interrupted i.e. service shutdown !isHiddenHandler(p) // To exclude hidden (temporary etc) files && ( toStringHandled(p.getFileName()).contains(target) // Target in file/folder name... || (lookInside ? fileContainsTarget(p, target) : false) // ...or target within file text ) ) .peek(p -> resultsDisplayTextArea.append(p + "\n")) // Print paths found in output/display area .allMatch(p -> !searchService.isShutdown()); // Stop stream if user shuts down search if (resultsDisplayTextArea.getText().equals("")) { resultsDisplayTextArea.setForeground(Color.BLUE); resultsDisplayTextArea.setText("No results found"); } // (Seems inelegant, but have not thought of a way to do directly from the stream code yet) statusDisplayTextArea.setText("Search completed"); // --temporary, for testing...------------------------------------------------------------------------------------- System.out.println("Search time in milliseconds approx: " + (System.currentTimeMillis() - startSearch)); } catch (IOException ex) { if (ex.getClass().getName().equals("java.nio.file.NoSuchFileException")) { missingInputsMessage(); } // (Probably not needed since early version, though, as should not get to try clause without valid start path) } }); } else { missingInputsMessage(); } } // (...end of 'actionPerformed' method body of the implicit ActionListner) ); findButton.setMargin(new Insets(10, 10, 10, 10)); findButton.setFont(new Font("SansSerif", Font.BOLD, 20)); findAndShowPanel.add(findButton, BorderLayout.NORTH); statusDisplayTextArea.setEditable(false); statusDisplayTextArea.setLineWrap(true); statusDisplayTextArea.setWrapStyleWord(true); statusDisplayTextArea.setMargin(new Insets(5, 5, 5, 5)); statusDisplayTextArea.setFont(new Font("SERIF", Font.BOLD, 20)); statusDisplayTextArea.setForeground(Color.GREEN); findAndShowPanel.add(new JScrollPane(statusDisplayTextArea), BorderLayout.CENTER); resultsDisplayTextArea.setEditable(false); resultsDisplayTextArea.setLineWrap(true); resultsDisplayTextArea.setWrapStyleWord(true); resultsDisplayTextArea.setMargin(new Insets(5, 5, 5, 5)); findAndShowPanel.add(new JScrollPane(resultsDisplayTextArea), BorderLayout.SOUTH); frame.add(inputsPanel, BorderLayout.NORTH); frame.add(findAndShowPanel, BorderLayout.SOUTH); frame.setResizable(false); // (buttons disappear if user drags frame bottom up, // while if it's dragged down, the extra space just appears as a gap between the panels, // so just hard-coding big resultsDisplayTextArea for the moment; // looking briefly online, see descriptions/code for how to make rezizable by dragging, but not trivial frame.setVisible(true); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); frame.pack(); // (may need to keep this positioned last) } void missingInputsMessage() // Puts message in display area if one or both user inputs missing/incorrect { statusDisplayTextArea.setText(null); resultsDisplayTextArea.setForeground(Color.red); resultsDisplayTextArea.setText(missingStartMessage + missingTargetMessage); missingStartMessage = ""; // Reset for future clicks missingTargetMessage = ""; // (Ditto) } boolean isHiddenHandler (Path p) // As Files.isHidden(...) throws checked exception... { // ...easier to handle it externally than in stream coded in findButton's addActionListener above boolean result = false; try { result = Files.isHidden(p); } catch (IOException ex) { System.out.println("From isHiddenHandler: " + ex); } return result; } String toStringHandled (Path p) // As Path's toString() can throw NullPointerException... { // ...easier to handle it externally than in stream coded in findButton's addActionListener above String result = ""; try { result = p.toString(); // p is filename for start path, and is null if that is root, e.g. C:\ } catch (NullPointerException npEx) // ...in which case NullPointerException is thrown { resultsDisplayTextArea.setText("Note: As you are starting from the root, " + "the search may terminate early if subfolders without access permission are encountered. " + "(This is a known issue, and may also affect searches starting further down the hierarchy, " + "in which case you will not see feedback, unfortunately)." + " May be addressed in a subsequent version of the program." + "\n\n"); } if (!caseSensitive) // If user has chosen case-sensitive option, this does not happen... { result = result.toLowerCase(); // ...search term made all-lowercase } return result; } boolean fileContainsTarget (Path p, String target) { String fileName = (p.getFileName().toString().toLowerCase()); if (Files.isReadable(p)) // (Have not found I actually need this isReadable check, but testing has been limited) { if ( ( fileName.endsWith(".txt") || // Defining file types in which to search... // fileName.endsWith(".java") || // [Added for personal occasional use; have not decided whether to include] fileName.endsWith(".csv") // ...and could add any other 'UTF text' types if there are any ) ) { return txtORcsvHasTarget(p, target); } else if (fileName.endsWith(".docx")) { return docxHasTarget(p, target); } else if (fileName.endsWith(".doc")) { return docHasTarget(p, target); } else if (fileName.endsWith(".xlsx") || fileName.endsWith(".xls")) { return xlHasTarget(p, target); } } // (Or could parse the file extension as a string and use a switch statement instead) return false; } boolean txtORcsvHasTarget (Path p, String target) { try ( Stream linesFromFile = Files.lines(p, StandardCharsets.ISO_8859_1); // Note for future reference: adding second arg StandardCharsets.ISO_8859_1 may avoid // MalformedInputException being thrown for non-UTF-encoded files, // e.g. docx, xlsx, pdf, which I am not including here as only 'gibberish' symbols are displayed of course ) { if (!caseSensitive) { return linesFromFile.map(String::toLowerCase).anyMatch(s -> s.contains(target)); } return linesFromFile.anyMatch(s -> s.contains(target)); // General note(s): As anyMatch(...) will return as soon as a match is found (if any) // it will not waste resources/time processing subsequent file lines. // Order in which lines is processed not important, // so might investigate later if making stream parallel improves speed } catch (IOException ex) { System.out.println("From txtORcsvHasTrget, IOException: " + ex); } catch (Exception ex) { System.out.println("From txtORcsvHasTrget: " + ex); } return false; } boolean docxHasTarget(Path p, String target) { try { FileInputStream fIS = new FileInputStream(p.toFile()); // (Is there a more modern API I could use here with Apache POI?) XWPFDocument docx = new XWPFDocument(fIS); List paragraphList = docx.getParagraphs(); // (Would be nice to generate a stream rather than // read everyting into memory before processing, but there does not seem to be a methof for that in current Apace POI?) // May be a way to do for Excel files when I address them later? http://poi.apache.org/components/spreadsheet/limitations.html Stream paragraphStringStream = paragraphList.stream().map(para -> para.getText()); if (!caseSensitive) { paragraphStringStream = paragraphStringStream.map(String::toLowerCase); } return paragraphStringStream.anyMatch(s -> s.contains(target)); } catch (IOException ex) { System.out.println("From docxHasTarget, IOException: " + ex); } catch (org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException ex) // See Note2 at bottom file { // System.out.println("org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException from docxHasTarget(...) method, " // + "processing file " + p + " ...ignore and continue."); // Keeping disabled unless needed for troubleshooting as these prints are visible to end-user in GUI // and can slow the search if many 'problem' .docx files are encountered } catch (Exception ex) { System.out.println("From docxHasTarget: " + ex); } return false; } boolean docHasTarget(Path p, String target) { try { FileInputStream fIS = new FileInputStream(p.toFile()); HWPFDocument doc = new HWPFDocument(fIS); WordExtractor extractor = new WordExtractor(doc); String[] paragraphStrings = extractor.getParagraphText(); // (getText() would extract all text as single String) Stream paragraphStringStream = Arrays.stream(paragraphStrings); if (!caseSensitive) { paragraphStringStream = paragraphStringStream.map(String::toLowerCase); } return paragraphStringStream.anyMatch(s -> s.contains(target)); } catch (IOException ex) { System.out.println("From docHasTarget, IOException: " + ex); } catch (IllegalArgumentException ex) // See Note1 at bottom file { // System.out.println("IllegalArgumentException from docHasTarget(...) method ," // + "processing file " + p + ", probably because" // + " a non-doc file with a mis-applied doc extension was encountered...ignore and continue."); // Keeping disabled unless needed for troubleshooting as these prints are visible to end-user in GUI // and can slow the search if many 'problem' .doc files are encountered } catch (Exception ex) { System.out.println("From docHasTarget: " + ex); } return false; } boolean xlHasTarget(Path p, String target) { try (Workbook workbook = WorkbookFactory.create(p.toFile())) // Instantiating the workbook in try-with-resources { // allows it to be auto-closed whether ot not target is found, or in the event of an exception // (otherwise, cannot manually open xls files after a search while this program (GUI) is open) DataFormatter dataFormatter = new DataFormatter(); for(Sheet sheet: workbook) { for (Row row: sheet) { for(Cell cell: row) { String cellValue = dataFormatter.formatCellValue(cell); if (!caseSensitive) { cellValue = cellValue.toLowerCase(); } if (cellValue.contains(target)) { return true; } } } } } catch (IOException ex) { System.out.println("From xlHasTarget, IOException: " + ex); } catch (Exception ex) { System.out.println("From xlHasTarget: " + ex); } return false; } // (Re docHasTarget & docxHasTarget: is there an equivalent to WorkbookFactory that would // allow processing of xls and xlsx files as undifferentiated 'Document' together) public static void main(String[] args) { new FindFileOrFolder_v11_b2(); } } /* Items that might be added/addressed later: --Test whether making any of the Streams parallel (where possible, i.e. without loss of any beneficial output order etc) improves speed (on my system, at least); especially relevant if looking within files. So far, have not seen benefit on crude testing of calling parallel() before peek(...) on targetStream in findButton event code (previous version, v11_b1), nor of experient in this version changing from Files.find(...) to Files.walk(...) followed by filter(...) on the returned streamto test effect of making stream parallel. --Fields could be encapsulated, but I don’t think it is worth the extra code complexity at the moment. Encapsulation could be considered if the API of subsequent versions of the code were to be used from external programs. --Though program seems to work well enough (as far as my limited testing has ascertained), the way this is done is messy. (Could of course make a List/Queue of results before printing it from a loop (instead of stream) as that could be broken more straightforwardly with Thread.interrupted() / searchService.isShutdown() on starting a new search but I would rather allow the user to see the results as they are being produced by Files.find(…).) Further work: perhaps change to FileVisitor with Files.walkFileTree(...) in place of Files.find(...) for better control? And/or try SwingWorker? --Attempt to allow a search to be truncated without exiting the program or starting a new search, ideally retaining results found to that point in the display. --doc/docx files that have target text in tables do not appear in results. Apache POI methods used so far do not read inside this formatting? May investigate further. Also, might try tosupport search in further file types, e.g. ppt/pptx, pdf. Could also add java source files simply by adding that to list of extensions sent to method currently named txtORcsvHasTarget. --Maybe try to address known issue that searches with search-in-subfolders enabled from some start paths near the root, e.g. C:\Users, and even C:\Users\[my user name]\Documents on my system, terminate prematurely. (However, though the user does not receive feedback and may not realise that not all files which should be found are, the program does not crash and will respond normally to a 'regular' subsequent search.) Cause = AccessDeniedException thrown due to denial of access to some folders. Might not be able to handle this from Files.find(...), as used currently, or Files.walk(...), so perhaps try to instead use walkFileTree + FileVisitor. --User settings provided by the upper buttons (all but the Find button) could instead be provided by dropdowns, radio buttons or checkboxes (+see * below), to make current settings more immediately obvious. Checkboxes would be especially suitable as it would be best to allow user to choose which combination file types to include. --Maybe try to address known issue that searches with search-within-files enabled may be slowed if 'inauthentic' files having .doc(/.docx) extensions are encountered. See comments in Note1 and Note2 below. Way to determine actual file type rather than just parsing extension before sending to Apache POI code? Perhaps use walkFileTree + FileVisitor instead of Files.find(...) to skip _vti_cnf folders? *Check boxes might be especially useful to allow allow user choose serch-within only some of the supported file types --Known issue: Sometimes a given search takes ~10x or more longer than it does if executed subsequently once or repeatedly (on my system). Also, the GUI may take a long time to open sometimes. Try to find out why and address. --Maybe address any other notes left in comments re possible reconfigurations. --Known issue with the xlHasText method that (probably) does not affect the user (output to GUI): Often get "Cleaning up unclosed ZipFile for archive: [name of my xls/xlsx file]" written to the console (System.err, as it's in red?) Looked online but still not certain re the cause and if I can do anything Not a JDK Exception/Error per se, as (now-removed) catch clause for Thowable does not capture */ /* Other notes: --Note1: Searching in one of my large folders (note to self - “Archives”) with search-in-subfolders & within-files enabled, got “java.lang.IllegalArgumentException: The document is really a UNKNOWN file” ...coming from HWPFDocument via invoction of my docHasTarget(...) method. Looking online,this may be applicanble: https://stackoverflow.com/questions/4996954/error-in-displaying-a-doc-file-reading-that-from-a-document-on-console-in-java comments “That's a typical message of an IllegalArgumentException from the HWPFDocument constructor. To the point it means that the supplied file is actually a (Wordpad) RTF file whose .rtf extension has incorrectly been renamed to .doc.” On handling the exception, I note that many of my 'problem' files have the .docx extension and open with MS word but are within _vti_cnf folders that I inadvertantly made in my archive folders years ago, and/or were transferred from an Apple Mac years ago, sometimes involving RESOURCE.FRK entity. Are some files wrongly flagged as not .doc? How much are things slowed up? --Note2: I assume the org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException that I see thrown (albeit much less frequently from the files in the folder mentioned in Note1) from docxHasTarget(...) is somewhat analogous to that of Note1. Also from _vti_cnf folders, as per Note1 above. */